ISSN: 2320-2882

# IJCRT.ORG



# INTERNATIONAL JOURNAL OF CREATIVE RESEARCH THOUGHTS (IJCRT)

An International Open Access, Peer-reviewed, Refereed Journal

# A 16-nm High Speed and Highly Efficient Conditional Flip Flops for High Speed Application using CMOS Technology (TGFF, MHLFF, SCDFF, CEPFF, SFTPFF, CDMPFF, (Pulse Triggered FFs) P-FFs)

<sup>1</sup>DIDDE RAMYA, <sup>2</sup>Dr.U.V.RATNA KUMARI, <sup>1</sup>PG SCHOLAR, <sup>2</sup>PROFESSOR, <sup>1</sup>DEPARTMENTOF ELECTRONICS AND COMMUNICATION ENGINEERING, <sup>1</sup>UNIVERSITY COLLEGE OF ENGINEERING(A) JNTUK, KAKINADA, INDIA

*Abstract:* In recent digital applications requires highly efficient and high speed gadgets, its related to minimum delay and power consumptions. Thus, this proposed work will present a novel type of conditional feed through pulse triggered flip-flop. The data output of this flip flop was highly optimized using pre-discharging and conditional signal feed through schemes and the power consumption also reduced using shared pulse generator and an output feedback conditional keeper, which diminished the floating status of the internal node. This proposed work will compared this pulse triggered flip flop to seven different conventional topology flip flops at 16-nm CMOS Technology in such as Transmission gate flip flop (TGFF), Modified Hybrid latch flip flop (MHLFF), static latch conditional discharge flip-flop (SCDFF), conditional pulse enhancement pulse triggered flip flop (CDMPFF). Hence, the proposed work to reduce the number of transistor in Feed through pulsed triggered flip flop and proved the performance of delay and area reduction in Proposed P-FFs.

# Index Terms - Pulse triggered, Feedthrough, Flip flop(FF).

# I. INTRODUCTION

Circuits that are extremely energy-efficient are crucial in the construction of high-performance computer processors for next-generation Exa-scale computing. The clock system consumes more than half of the total dynamic power in such processors. Flip-flops (FFs) and latches, as basic storage devices, typically dissipate 80 percent of the overall clock power [1]. The processor clock period and overall power consumption are heavily influenced by the data-to-output (D-to-Q) delay and power dissipation. As a result, high-performance computers require energy-efficient storage devices.

Pulse-triggered FFs (P-FFs) have previously been used to improve performance in various studies. They've been proven to work well in high-speed applications [3]–[6]. A single latch structure and a clock pulse generator(PG) are included in P-FFs. The P-FF functions like a master–slave flipflop (MS-DFF) with minimal time overhead if the clock pulse width is narrow enough. Time borrowing with a negative setup time is possible because to the single latch structure. Its minimal design also decreases power dissipation and area overhead.

Reduced power usage or increased speed can both help to enhance energy efficiency. Previous research have used lowpower strategies such conditional capture, conditional precharge, conditional discharge, and conditional data mapping, but they have suffered from speed deterioration [8]–[11]. Consoli et al. [12] suggested a conditional push–pull structure, an implicit type P-FF in which the final stage and split routes in the first stage allowed for a large reduction in parasitic effort. When compared to existing P-FFs, measurements with a 65-nm test chip revealed a speed gain of 1.5–2.



Figure. 1: Topology of TGFF[2]

Lin [7] suggested an explicit P-FF with a signal feedthrough mechanism to accelerate data transitions while maintaining superior power efficiency and performance than traditional explicit-type P-FFs. However, because of the stacked pull-down transistors, the 0-to-1 delay was shortened, resulting in imbalanced 0-to-1 and 1-to-0 data transitions. The 0-to-1 transition for the output waveform encounters a voltage step in the threshold loss of the pass transistor, lowering the output and lowering the performance.



Figure 2: Conventional P-FF topology (a) MHLFF [13], (b) CEPFF [14], (c) SFTPFF [7], (d) SCDFF [15] and (e) CDMPFF [16]. P-FF = pulse-triggered flip-flop, MHLFF = modified hybrid latch flip-flop, CEPFF = conditional pulse enhancement P-FF, SFTPFF = signal feedthrough P-FF, SCDFF = static latch conditional-discharge flip-flop, and CDMPFF = conditional data mapping P-FF.

To address these difficulties, this study proposes a unique energy-efficient conditional feedthrough P-FF with shared pulse generation. Its main contributions are: 1) reducing D-to-Q delay with a predischarge scheme; 2) reducing D-to-Q delay with balanced rising and falling output edges with a conditional signal feedthrough scheme; 3) eliminating the voltage stairway for both data input and output; and 4) reducing internal power with an output-controlled storage node keeper. The following is how the rest of the article is structured. The proposed topology and transistor size are detailed in Section II. Section III presents postlayout simulation findings and comparisons with state-of-the-art topologies, while Section IV draws conclusions.

# II. PROPOSED PREDISCHARGE PULSED-TRIGGERED FF USING CONDITIONAL SIGNAL FEEDTHROUGH

# A. Conventional P-FF

P-FFs that are implicit are frequently thought to be more power efficient than explicit[7]. However, because the discharge path in modern FinFET technology is longer, this does not apply.

This can reduce performance, especially if the voltage is close to the threshold. Furthermore, forming the "pulse" is incredibly complicated, and it is particularly sensitive to process fluctuations. Figure 2(a) [13] shows an example of a conventional implicit-type P-FF, the modified hybrid latch flip-flop (MHLFF). By utilising a static latch and an output-controlled keeper to eliminate needless node transitions, this topology reduces power consumption in the s1 node.

It does, however, run into four issues.

1) To form the "pulse," many inverters and a big N4 transistor are required, increasing clock power.

2) Because node s1 is not predischarged, the 0-to-1 D-to-Q latency is significant.

3) When the output Q and input data both equal "1," the node s1 begins to float, resulting in increased dc power consumption.
4) When D is "1" and CK is "0," the output waveform is uplifted because to charge sharing in the discharge path, resulting in extra power dissipation. Hwang et al.[14]. Suggested an improved conditional pulse enhancement P-FF (CEPFF) structure [see

Fig. 2(b)], which used a pass transistor logic-based "AND" logic gate to reduce the number of transistors stacked along the discharge path.

To reduce the size of the transistors in the pulse generation circuit, an extra PMOS was added to provide conditional enhancement for the height and width of the discharge pulse. The always on precharge, on the other hand, produces enormous short-duration currents in 16-nm FinFET technologies, necessitating larger discharge transistors to avoid functional failure. Additionally, stacking discharge path slows down 0-to-1 transitions.

Fig. 2(c) shows an explicit-type signal feedthrough P-FF (SFTPFF) [7]. This structure resolves the long discharge path issue in the conventional explicit type P-FF structures and achieves better speed and power performance by controlling the output directly with the input.

The static latch conditional-discharge flip-flop (SCDFF) architecture is shown in Figure 2(d) [15]. A static latch structure is used to reduce power dissipation in this design. The s1 node is also not discharged on a regular basis, and a long-stacked discharging path results in prolonged D–Q delays. A conditional data mapping P-FF (CDMPFF) is shown in Figure 2(e), which uses the output to manage the data input and reduce power usage [16]. However, due to extensive discharge routes, the D-to-Q delay remains nonoptimal [16]. The above-mentioned traditional design has the same errors. As a result, the fundamental purpose of this research is to reduce both D-to-Q delay and power consumption in modern systems to achieve energy efficiency in FinFET technology.

# **B.** Proposed P-FF

Due to the inherent tradeoff between speed and consumption, traditional P-FFs are tuned for either high speed or low power, with no optimization for energy efficiency. Unbalanced pull-up and pull-down paths in traditional topologies produce a longer 0-to-1 D-to-Q delay, lowering the circuit's energy efficiency. A delay chain-based PG is used in many designs, which requires high dynamic power along the clock paths. Furthermore, the internal nodes swap with changing input data, which is unneeded because it consumes additional dc power. Each of these elements reduces the circuit's efficiency.



Figure 3: Proposed P-FF and SFTPFF [11] waveforms, SFTPFF = signal feedthrough P-FF

The proposed architecture (Fig. 3) addresses two efficiency issues: 1) speed is raised as the number of transistors in the pull-down path is reduced, and 2) additional pull-up and pull-down paths are introduced (MP4 and MN4). Through transistor reordering in a stacked discharge path and complementing conditional signal feedthrough techniques, D-to-Q latency was significantly lowered. The pass transistor was replaced with an output feedback-controlled transmission gate, unlike other feedthrough approaches. When compared to the pass transistor method [7], this resulted in a more efficient feedthrough with no threshold loss, removing the output voltage step during a 0-to-1 transition and minimising the D-to-Q delay. The transmission gate was controlled via output feedback, which prevented unwanted transmission gate turn-on, saved power, and avoided a voltage step in the input D. (affected by the output Q). A second discharge path is included in the proposed design to minimise charge sharing induced by the feedthrough transistor. A static latch was included to avoid repeated precharging, and an output feedback keeper was employed to avoid excessive internal node switching (s1).

To summarise, the proposed TSPC P-FF structure has four main advantages over previous designs.

1) Unlike with traditional P-FF discharge routes, the data-controlled discharge transistor (MN3) is linked to the ground closer than the clock-controlled discharge transistor (MN2) (MN1). This reordering of stacked discharge transistors causes a predischarge for both 0-to-1 and 1-to-0 output transitions, which reduces the D-to-Q time.

2) A transmission gate controlled by the output was also added, allowing the input data to be transmitted straight to the output (MN7, MP7, MN8, and MN9).

3) Discharge transistors with internal node control (MN4) and pulse clock control (MN2) were used.

4) To reduce the size and power of the clock network, a shared width programmable pulse generation circuit was employed in conjunction with a clock mesh architecture.

#### C. Transistor sizing

The use of transistor sizing to examine design spacing in a set target feature architecture is a useful technique. The proposed structure aims to reduce both propagation delay and power consumption at the same time in order to achieve high energy efficiency. Due to increased channel controllability, a high ON/OFF current ratio, reduced short-channel effects, and relative immunity to gate line-edge roughness, FinFET devices have been presented as a possible nanoscale substitute for conventional bulk CMOS based devices [18].

Transistors in the proposed design could be classified into three groups: storage nodes, timing paths, and others. The storage group contained MP5, MP6, MN5, and MN6 devices, which had little impact on performance but did affect power consumption and noise margins. Transistor size must be optimal to produce a balanced noise margin for both 0 and 1.

Two timing paths exist in the proposed design: data rising and data falling. The timing group contains MP1, MN2, and MN4 (data-falling path), and MN1, MN3, MP3, and MP4 (data-rising path) devices.

The special group contained MP2, MN7, MP7, MN8, and MN9. As an output feedback-controlled keeper, MP2 was used to diminish the floating status of the internal node (s1).

#### **III. POSTLAYOUT SIMULATIONS AND COMPARISONS**

Several traditional sequential cell architectures were constructed in a 16-nm FinFET process to test the proposed device's performance and energy efficiency. The standard TGFF shown in Fig. 1 and the five P-FF designs displayed in Fig. 2 were among these structures (MHLFF, CEPFF, SFTPFF, SCDFF, and CDMPFF). These designs were originally sized to work well across a wide range of process changes and then modified for maximum power efficiency.



The model presented in Fig. 4 was used to simulate each design-under-test structure. A Dyskl-style programmable width PG was used to create the pulse clock (shown in Fig. 5). To imitate real-world conditions in a CPU, the clock frequency was set to 2.5 GHz, and the pulse clock transition time was set to 1/6 of the clock period with DUTs. For deidealization, a buffer with a 4 inverter was used to generate the input signal, and the output node was filled with 16 \*1 inverters.



Figure 5: Pulsewidth programmable clock PG

Table :1 Comparison of Highly Efficient Conditional Feed Through Pulse Flip Flop for High Speed Applications Using 16 nm CMOS Technology ( $0.8 \text{ V} \sim 1.8 \text{ V}$ )

|                         | Comparison of Highly Efficient Conditional Feed Through Pulse Flip Flop for High Speed Applications<br>Using 16 nm CMOS Technology ( 0.8 V ~ 1.8 V) |         |         |          |         |         |               |
|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------|----------|---------|---------|---------------|
|                         |                                                                                                                                                     |         |         |          |         |         |               |
| Structure               | TGFF                                                                                                                                                | MHLFF   | CEPFF   | SFTPFF   | SCDFF   | CDMPFF  | Proposed P-FF |
| No of                   | 22                                                                                                                                                  | 9       | 15      | 14       | 17      | 11      | 18            |
| Transistors             |                                                                                                                                                     |         |         |          |         |         |               |
| Area (nm <sup>2</sup> ) | 352                                                                                                                                                 | 144     | 240     | 224      | 272     | 176     | 288           |
| Power (uW)              | 59.2219                                                                                                                                             | 0.71745 | 0.73571 | 0.006990 | 0.00306 | 68.5517 | 0.187002      |
| (Average)               |                                                                                                                                                     |         |         |          |         |         |               |
| Delay (ns)              | 4.9674                                                                                                                                              | 6.4442  | 1.9662  | 1.0849   | 1.0850  | 1.0851  | 0.1909        |



# **A. Timing Parameters**

The minimum delay from the data to the output is defined as the timing overhead or insertion penalty for a consecutive cell. Due to the incorporation of predischarge and feedthrough methods, the proposed design was 62 percent more efficient than TGFF in terms of D-to-Q latency. Supplementary feedthrough and drive strength improvements resulted in a 28 percent improvement over SFTPFF. The proposed structure improved the CK-to-Q latency by 50.7 percent over TGFF and 50.2 percent over SFTPFF. The proposed architecture appears to be a feasible new option for high-speed applications, based on these findings.

# **B.** Power Consumption

To evaluate power dissipation in these structures, three distinct input data patterns were utilised, with data toggling rates of 25% (a "1-1-0-0" data pattern), 50% (a "1-0-0-1" data pattern), and 100% (a "1-0-1-0" data pattern). At any input data toggling rate, the TSPC structure and shorter discharge path provide a power benefit. For example, at a 50% toggle rate, the proposed structure's power reduction ranged from 65.4 percent (relative to CEPFF) to 8.6 percent (compared with TGFF). When operating at lower input data toggling rates, the difference was more pronounced. All combinations of clocks and data were scanned in four situations to extract leakage power. However, in high-speed applications, leakage isn't necessarily a major worry, and this problem could be overcome utilising system-on-chip (SoC) power gating technology.

Chart 1: Comparison of delay among all the FF's



# **C. Power Efficiency**

Power efficiency is equivalent to the power-delay product and is defined as the power dissipation per switch (mW/MHz) (fJ). For all input data activity and process variations, the proposed design provided improved performance and power consumption rates in power delay product (PDP) evaluations. It's worth noting that of the seven devices examined, the proposed construction is the most energy-efficient. Due to considerable performance advantages, it outperformed TGFF by 87 percent in energy delay product (EDP) at a toggle rate of 50 percent.

Monte Carlo simulations based on variations in transistor size were used to further evaluate the structure's robustness and dependability. A normal distribution with a standard deviation equal to 5% of the transistor width was used to model these changes. The 2-D minimal D-to-Q latency and power consumption findings for all evaluated architectures are shown in Fig. 9. Higher efficiency is represented by values that are closer to the origin. As can be seen, the proposed structure has the best efficiency and convergence. Due to the huge transistor sizes for PG, TGFF was less efficient, while CEPFF had the largest variation and the worst overall performance. Due to the greater discrepancies, Monte Carlo simulation data for SCDFF were removed.

# **IV. CONCLUSIONS**

This work represents a novel conditional SFTPFF structure. The proposed structure employed three optimization techniques to improve energy efficiency. The first successfully reduced power consumption and eliminated voltage stairways in both data input and output waveforms caused by feedthrough transistors. This was accomplished by adding an output feedback-controlled transistor. The complementary conditional feedthrough scheme alleviated the imbalance of a rising and falling output edge and consequently reduced the minimum D-to-Q delay. The second strategy reduced the discharge path and reordered transistors to reduce D-to-Q delay. The third approach used output-controlled internal node keepers to eliminate unnecessary switching of the internal node s1, thereby saving power. Using the postlayout simulation results from commercial CMOS 16-nm technology, the proposed structure could operate stably in a wide-pulsewidth range (20–200 ps). It also featured an improved D-to-Q power-delay product and an energy-delay product, compared with the conventional designs. Simulated results demonstrated a significant improvement over other P-FF devices in both performance and energy efficiency.

#### REFERENCES

- T. Fischer et al., "Design solutions for the bulldozer 32 nm SOI 2-core processor module in an 8-core CPU," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, Feb. 2011, pp. 70–80. doi: 10.1109/isscc.2011.5746227.
- [2] D. Markovic, B. Nikolic, and R. W. Brodersen, "Analysis and design of low-energy flip-flops," in Proc. Int. Symp. Low Power Electron. Design, Huntington Beach, CA, USA, Aug. 2001, pp. 52–55. doi: 10.1109/LPE. 2001.945371.
- [3] M. Alioto, E. Consoli, and G. Palumbo, "General strategies to design nanometer flip-flops in the energy-delay space," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 7, pp. 1583–1596, Jul. 2010. doi: 10.1109/tcsi.2009.2033538.
- [4] M. Alioto, E. Consoli, and G. Palumbo, "Flip-flop energy/performance versus clock slope and impact on the clock network design," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 6, pp. 1273–1286, Jun. 2010. doi: 10.1109/tcsi.2009.2030113.
- [5] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part I— Methodology and design strategies," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011. doi: 10.1109/ tvlsi.2010.2041376.
- [6] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part II— Results and figures of merit," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 737–750, May 2011. doi: 10.1109/tvlsi.2010.2041377.
- [7] J. F. Lin, "Low-power pulse-triggered flip-flop design based on a signal feed-through," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 1, pp. 181–185, Jan. 2014. doi: 10.1109/tvlsi.2012.2232684.
- [8] P. Zhao, T. K. Darwish, and M. A. Bayoumi, "High-performance and low-power conditional discharge flip-flop," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May 2004. doi: 10.1109/tvlsi.2004.826192.
- [9] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE J. Solid-State Circuits, vol. 36, no. 8, pp. 1263–1271, Aug. 2001. doi: 10.1109/4.938376.
- [10] V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic, and N. M. Nedovic, Digital System Clocking: High-Performance and Low-Power Aspects. Hoboken, NJ, USA: Wiley, 2003. doi: 10.1002/0471723703.
- [11] C. K. Teh, M. Hamada, T. Fujita, H. Hara, N. Ikumi, and Y. Oowaki, "Conditional data mapping flip-flops for low-power and high-performance systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 12, pp. 1379–1383, Dec. 2006. doi: 10.1109/tvlsi.2006. 887833.
- [12] E. Consoli, G. Palumbo, J. M. Rabaey, and M. Alioto, "Novel class of energy-efficient very high-speed conditional pushpull pulsed latches," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 7, pp. 1593–1605, Jul. 2014. doi: 10.1109/tvlsi.2013.2276100.
- [13] S. H. Rasouli, A. Khademzadeh, A. Afzali-Kusha, and M. Nourani, "Low-power single- and double-edge-triggered flipflops for highspeed applications," IEE Proc.-Circuits, Devices Syst., vol. 152, no. 2, pp. 118–122, Apr. 2005. doi: 10.1049/ip-cds:20041241.
- [14] Y.-T. Hwang, J.-F. Lin, and M.-H. Sheu, "Low-power pulse-triggered flip-flop design with conditional pulse-enhancement scheme," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 2, pp. 361–366, Feb. 2012. doi: 10.1109/tvlsi.2010.2096483.
- [15] M. W. Phyu, W. L. Goh, and K. S. Yeo, "A low-power static dual edgetriggered flip-flop using an output-controlled discharge configuration," in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, pp. 2429–2432. doi: 10.1109/iscas.2005.1465116.
- [16] A. Karimi, A. Rezai, and M. M. Hajhashemkhani, "A novel design for ultra-low power pulse-triggered D-flip-flop with optimized leakage power," Integration, vol. 60, pp. 160–166, Jan. 2018. doi: 10.1016/j.vlsi.2017.09.002.
- [17] S. Dhong et al., "A 0.42 V Vccmin ASIC-compatible pulse-latch solution as a replacement for a traditional master-slave flip-flop in a digital SOC," in Proc. IEEE Custom Integr. Circuits Conf., San Jose, CA, USA, Sep. 2014, pp. 1–4. doi: 10.1109/cicc.2014.6946044.
- [18] Q. Xie, X. Lin, Y. Wang, S. Chen, M. J. Dousti, and M. Pedram, "Performance comparisons between 7-nm FinFET and conventional bulk CMOS standard cell libraries," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no. 8, pp. 761–765, Aug. 2015. doi: 10.1109/tcsii.2015. 2391632.