# INTERFERENCE REDUCTION IN PHASE LOCKED LOOP USING GRAPHICS DOUBLE DATA RATE RANDOM ACCESS MEMORY INTERFACE

P.Meenakshi vidya<sup>1</sup>,VT.Revathi <sup>2</sup>,Dr.Sudha<sup>3</sup>

Associateprofessor<sup>1</sup>, PGScholar<sup>2</sup>, Professor<sup>3</sup>, Easwari engineering college, Anna University

Chennai

*Abstract* : Electromagnetic Interference (EMI) is an increasingly important factor in determining whole-system performance of a mobile system. This is driven by size reduction of the mobile platform and the ever-increasing density of electronic components. In this work, we suggest an analysis approach for EMI effects in the interface between an application processor (AP) and dynamic randomaccess memory (DRAM) by using an I/O driver model including power delivery network (PDN) effects. A 10 Gbits/s/pin graphics DRAM interface is developed in 65-nm CMOS technology. Several design techniques are proposed for high-speed operation in a noisy environment. A fast precharging data sampler guarantees high-speed sampling without the need for a decision feedback equalizer. In order to increase the data sampling margin, the PLL bandwidth is optimized depending on the system noises, which reduces the clock jitter by up to 55.1%. The crosstalk-induced jitter (CLJ) reduction technique suppresses the DQs jitter by employing the suggested training sequence for the GDDR5 interface. Pre- and de-emphasis are merged in one auxiliary driver. This chip operates at 10 Gbits/s/pin and exhibits a data eye opening of 0.78 UI with the CLJ reduction technique. The power consumptions of the TX and RX are 8.28 and 5.5 pJ/b/channel, respectively .In this work will be carried through CST EMC STUDIO<sup>®</sup> (CST EMCS)and MATLAB,which are specialized software package for analyzing crosstalk-induced jitter (CLJ) and electromagnetic interference (EMI) using 3D electromagnetic field simulation.

*IndexTerms* - Electromagnetic Interference (EMI), Application processor (AP) , Dynamic random-access memory (DRAM), Ccrosstalk-induced jitter (CIJ)

#### I. INTRODUCTION

IN RECENT years, the DRAM process has reached the limit of its scaling. Owing to the technical limit of the double in-line memory module or the PCB trace, DRAM vendors are trying to use the multichip package (MCP) and are researching the through silicon via (TSV) interface [1]–[3]. Although the memory capacity is increased by an MCP or TSV, a higher data rate for the I/O interface is still required because of the limited pin numbers. The data rate has increased significantly over the last decade, as shown in Fig. 1. However, the data rate increase in the double data rate DRAM interface has been slow owing to the lack of applications. In contrast, the data rate of graphics memories such as GDDRx products has already increased to 7 Gbits/s/pin [4], [5]. In particular, because of the continual increase in the display resolution the data rate per pin of next-generation GDDRs will need tobe over 10 Gbits/s/pin.

In the graphics DRAM design, the increase in data bandwidth is limited by the internal data bandwidth and the I/O speed. In the former case, the core operation speed cannot follow the device scaling speed. To overcome this limitation, the prefetch and bank-grouping techniques have been adopted [6]–[12]. The I/O speed is limited by the channel bandwidth and the system noise. As the data rate increases, crosstalk-induced jitter (CIJ) and intersymbol interference (ISI) degrade the signal integrity of the transmitted data and reduce the data sampling margin [13], [14]. Moreover, the data sampling margin cannot be obtained with the increased clock jitter generated by system noise.

Traditional synchronous DRAM interfaces, which require a relatively low-frequency clock, use a DLL rather than a PLL [15]. However, although a DLL-based clock multiplier has good performance, it is hard to guarantee a low jitter output clock with a noisy input reference clock signal. Moreover, mismatches between delay cells in the voltage controlled delay line degrade the quality of the generated high-speed clock signal. The clock generator in [16] and [17] alleviates the problem with the DLL-based clock multiplier; however, this scheme requires additional power consumption and a greater area because of the additional calibration logic circuits. As a result, a DLL-based clock multiplier with calibration logic is not suitable for a cost-effective high speed DRAM interface. Another method for generating a high speed clock signal is to use a PLL. In general, a charge pump PLL (CP-PLL) has a low-pass filter characteristic, with a feedback system.

### **II. RELATED WORK**

Crosstalk occurs due to the electrical communication among the nearby wires. It causes undesired signal noise which is to be coupled from an active line (aggressor) into a quiet line (victim). It also has a great impact on overall reliability and performance of IC. Rao & Tilak (2011) proposed a bus coding scheme for reducing the crosstalk in SoC. The data encoding concept was used to mitigate the crosstalk delay in the buses. In this method, as the geometries of circuit becomes smaller, wire connection become

closer together and taller, which increases the cross coupling capacitance among the nets. At the same time, parasitic capacitance to the substrate becomes less as the interconnection becomes narrower and cell delays are reduced as the transistors become smaller.

The crosstalk effect is a consequence of coupling and switching activities that was encountered when there was a transition as compared to the previous state of wire and when there occurs transitions in adjacent wires. Verma & Kaushik (2012) suggested a bus encoder, which significantly eliminates or reduces the worst case crosstalk. The transition in the state of buses decides the behavior of the switching and coupling activities. The reduction in the transition improves the performance in terms of reduced dissipation of power, delay in on-chip buses and coupling activity. The components that affect the behavior of the on-chip bus were internal parasitic capacitances of the capacitances of interconnect, transistors, and input capacitances of the fan-out gates.

Mallahzadeh et al (2010) introduced a step shaped transmission line by using crosstalk reduction. The use of step shaped line transmission, which generally attempts to create steps along the transmission lines to decrease the crosstalk, while consuming insignificant variation in return loss. The far-end crosstalk in wide bandwidth was decreased by this method. The capacitive coupling of nearby transmission lines was more than, less than or equal to the inductive coupling in accordance with the line parameters. Therefore, the far-end crosstalk of one section was approximately proportional to the difference of capacitive and inductive couplings. Hence, the crosstalk may have either positive or negative polarity depending on the width of the section, line parameters and type of transmission lines.

Babu et al (2012) introduced an efficient bus encoder by using bus inverting method. The crosstalk effect was reduced by inverting the original data, which in turn reduces the switching activity. In this method, a data bus was divided into different clusters where each cluster contains 4 data bits and one extra control bit. Usually a bus invert method, utilizes an extra control bit to differentiate the transmission of original data and inverted data. In thisscheme, if the number of transitions were more than half of the size of bus width, then the original data was inverted and the control line was set to high, whereas in other case original data was transmitted with logic low.

#### **III. PROPOSED WORK**

The top block diagram of the proposed transceiver is shown in Fig. 3. The TX is composed of a serializer, variable delay lines (VDLs) with a code decoder to insert the intentional skew for CIJ reduction, and an output driver with the AUX block for pre- and de-emphasis. The RX consists of an active filter, an FP-sampler, a de serializer (DeSER), and a VDL that is the same as that of the TX. The adaptive-bandwidth CP-PLL is used to mitigate the noise interference, and is shared by thevTX and RX. The impedance of the output driver is externally controlled by the on-die termination (ODT) control signal.



Fig.3.1. Overall architecture of the proposed transceiver.

In the TX, 8-bit 1.25 Gbits/s parallel data are fed to the serializer, and the parallel data are converted into 10 Gbits/sserial data. Before transmitting the serial data through the output driver, the delay of the serial data is controlled by the VDL to reduce the CIJ. The VDL delay is controlled by the code decoder, and the delay control codes come from the GPU. In this design, the VDLs of the even channels are set to a minimum value, and those of the odd channels are controlled by the delay code. The output impedance is externally controlled and pre- and de-emphases are adopted to compensate for the channel loss. In the RX, the transmitted data are inserted in the active filter and the filter output is fed to the FP-sampler. The applied delay at the TX is compensated by delaying the sampling clock using the delay code. The sampled data are fed to the DeSER and the deserialized data are transferred to the global I/O (GIO). The clock signal used at the TX and RX is generated by the adaptive-bandwidth CP-PLL. The 1.25-GHz WCK and WCKB coming from theGPU are used as reference clock signals. In the 10 Gbits/s GDDR5

interface, 5 GHz WCK and WCKB should be transferred from the GPU. However, in this design, 1.25-GHzWCK and WCKB are used to reduce the overhead during the sending and receiving of the 5-GHz clock signals. IV. RESULTS AND DISCUSSIONS

Modern Graphics DRAMs that are designed to have large Memory matrices. HDL Coder takes advantage of this feature and automatically maps matrices to block RAMs to improve area efficiency. For certain designs, mapping these persistent matrices to RAMs is mandatory if the design is to be realized. State-of-the-art synthesis tools may not be able to synthesize designs when large matrices are mapped to registers, whereas the problem size is more manageable when the same matrices are mapped toD RAMs.



Figure 6.1 Area Efficient Memory Mapping of Graphic DRAM

The effect of jitter is better illustrated by the eye diagram of the signal. The figure on the left shows the eye diagram of a signal with random jitter, while the figure on the right shows the eye diagram of a signal without jitter. The width of the jittered signal at the zero amplitude level is considerably larger than the width of the non-jittered signal as a result of the added random jitter. Note that even though this example focuses on real signals, the eye diagram object can also handle complex signals if the OperationMode property is set to 'Complex% Signal.



Figure 6.2 Total Jitter Measurement in GDRAM



Figure 6.3 Eye-Diagram of data interface

Timing jitter is defined as the deviation of a signal's timing clock from the ideal clock. Timing jitter can be divided into two main subcategories: deterministic and random jitter [1]. Two examples of deterministic jitter are periodic jitter and inter-symbol interference (ISI).Periodic jitter can be modeled as a sum a sinusoidals, while ISI can be modeled as a train of Dirac functions. Random jitter is modeled as Gaussian variation of the signal clock edges.The jitter encountered in a communication system can be any combination of these components. A commonly used combination is the dual-Dirac model, where ISI and random jitter are combined [2]. ISI is modeled by two equal amplitude Dirac functions. The following figure shows the probability density functions of random jitter, periodic jitter, periodic and random jitter, and ISI and random jitter. We generated the jitter samples using the jitter generator provided in the communication sources package.



Figure 6.4 Clock Induced Jitter

## **IX. CONCLUSION**

This study introduced the GDDR5 interface over a 10 Gbits/s/pin data rate. The need for the high-frequency clock necessitates the use of a PLL rather than a DLL. However, gain peaking in the classic CP-PLL increases the clock jitter, and reduces the data sampling margin. In this design, gain peaking is mitigated by the proposed adaptive bandwidth PLL. The proposed PLL reduces the jitter peaking by up to 55.1%. Although jitter is reduced by changing the bandwidth of the PLL, the SI of the transmitted data is still affected by the CIJ and the ISI, which needed to be considered to obtain an adequate data sampling margin. In this paper, the FP-sampler is implemented for high-speed sampling with a small received data eye opening without a DFE. Moreover, to reduce the CIJ, the training sequence to relax the system complexity has been proposed. With the proposed FP-sampler, less than 10–12 BER is achieved at input data amplitude of 80 mV, and the proposed CIJ reduction algorithm increases the eye-opening of the TX output by 25.6%.

## REFERENCES

[1] S.-B. Lim, H.-W. Lee, J. Song, and C. Kim, "A 247  $\mu$ W 800 Mb/s/pin DLL-based data self-aligner for through silicon via (TSV) interface,"*IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 711–723, Mar. 2013.

[2] U. Kang et al., "8 Gb 3-D DDR3 DRAM using through-silicon-via technology," IEEE J. Solid-State Circuits, vol. 45, no. 1, pp. 111–119, Jan. 2010.

[3] A.-C. Hsieh and T. Hwang, "TSV redundancy: Architecture and design issues in 3-D IC," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 4, pp. 711–722, Apr. 2012.

[4] S.-J. Bae *et al.*, "A 40 nm 2 Gb 7 Gb/s/pin GDDR5 SDRAM with a programmable DQ ordering crosstalk equalizer and adjustable clock-tracking BW," in *ISSCC Dig. Tech. Papers*, Feb. 2011,pp. 498–500.

[5] R. Kho *et al.*, "A 75 nm 7 Gb/s/pin 1 Gb GDDR5 graphics memory device with bandwidth improvement techniques," *IEEE J. Solid-StateCircuits*, vol. 45, no. 1, pp. 120–133, Jan. 2010.

[6] C. Kim, "High-bandwidth memory interface design," in Proc. ISSCC Tutorial, Feb. 2013.

[7] DRAM, JEDEC Standard JESD79-2F, JEDEC, Nov. 2009.

[8] DRAM, JEDEC Standard JESD79-3F, JEDEC, Jul. 2010.

[9] DRAM, JEDEC Standard JESD79-4, JEDEC, Sep. 2012.

[10] DRAM, JEDEC Standard JESD212, JEDEC, Dec. 2009.

[11] T.-Y. Oh *et al.*, "A 7 Gb/s/pin 1 Gbit GDDR5 SDRAM with 2.5 ns bank to bank active time and no bank group restriction," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 107–118, Jun. 2011.

[12] K. Koo *et al.*, "A 1.2 V 38 nm 2.4 Gb/s/pin 2 Gb DDR4 SDRAM with bank group and ×4 half-page architecture," in *ISSCC Dig. Tech. Papers*, Feb. 2012, pp. 40–41.

[13] J. F. Buckwalter and A. Hajimiri, "Cancellation of crosstalk-induced jitter," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 621–632, Mar. 2006.

[14] S.-J. Bae *et al.*, "A 60 nm 6 Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-reduction techniques," in *ISSCC Dig Tech. Papers*, Feb. 2008, pp. 278–279.

[15] H.-W. Lee *et al.*, "A 1.0-ns/1.0-V delay-locked loop with racing mode and countered CAS latency controller for DRAM interfaces," *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1436–1447, Jun. 2012.

[16] C. Kim, I.-C. Hwang, and S.-M. Kang, "A low-power small-area ±7.28-ps-jitter 1-GHz DLL-based clock generator," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1414–1420, Nov. 2002.

[17] S. Ok, K. Chung, J. Koo, and C. Kim, "An anti-harmonic, programmable DLL-based frequency multiplier for dynamic frequency scaling," *IEEE Trans. VLSI Syst.*, vol. 18, no. 7, pp. 1130–1134, Jul. 2010.

[18] T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, "A 12-Gb/s 11-mW half-rate sampled 5-tap decision feedback equalizer withcurrent-integrating summers in 45-nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1298–1305, Apr. 2009.

[19] Y.-M. Ying and S.-I. Liu, "A 20 Gb/s digitally adaptive equalizer/ DFE with blind sampling," in *ISSCC Dig. Tech. Papers*, Feb. 2011,pp. 444–446.

[20] Y. Lu and E. Alon, "A 66 Gb/s 46 mW 3-tap decision-feedback equalizer in 65 nm CMOS," in *ISSCC Dig. Tech. Papers*, Feb. 2013,pp. 30–31.

[21] J.-D. Han, W.-Y. Shin, W.-S. Choi, J.-H. Chun, S. Kim, and D.-K. Jeong, "A 5-Gb/s digitally controlled 3-tap DFE receiver for serial communications," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2010, pp. 1–4.

[22] K. Hu et al., "0.16–0.25 pJ/bit, 8 Gb/s near-threshold serial link receiver with super-harmonic injection-locking," *IEEE J. Solid-State Circuits*, vol. 47, no. 8, pp. 1842–1853, Aug. 2012.

[23] P. Dudek, S. Szczepanski, and J. V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line," *IEEE J. Solid-State Circuits*, vol. 35, no. 2, pp. 240–247, Feb. 2000.

