# An Area & Time Efficient Design of overloaded CDMA Architecture using Hancarlson Adder

P.Sravya M.Tech(VL&ES) Godavari Institute of Engineering and Technology(Rajahmundry)

E.Jagadeeswara Rao Associate Professor Godavari Institute of Engineering and Technology(Rajahmundry)

# ABSTRACT

On-chip interconnects are the performance bottleneck in modern system-on-chips. Code-division multiple access (CDMA) has been proposed to implement on-chip crossbars due to its fixed latency, reduced arbitration overhead, and higher bandwidth. In this paper, we advance overloaded CDMA interconnect (OCI) to enhance the capacity of CDMA network-on-chip (NoC) crossbars by increasing the number of usable spreading codes. Serial-OCI and P-OCI architecture variants are presented to adhere to different area, delay, and power requirements. Compared with the conventional CDMA crossbar, on a Xilinx Spartan-3E FPGA kit, the serial OCI crossbar achieves 100% higher bandwidth, 31% less resource utilization, and 45% power saving, while the parallel OCI crossbar achieves N times higher bandwidth compared with the serial OCI crossbar at the expense of increased area and power consumption. Further to increase the speed of OCI crossbar we are implementing Han Carlson adder in place of parallel adder architecture The use of Han-Carlson adder gives better performance than the existing system by 38% area reduced and 49% speed increased.

Index Terms—Code-division multiple access (CDMA) interconnect, CDMA router, network-on-chip (NoC), NoC physical layer, overloaded CDMA crossbar, Carry Select Adder, Hancarlson adder.

# I. INTRODUCTION

Increasing the communication overhead degrades the speedup achieved by parallel computing[1]. Therefore, developing efficient high-performance on-chip interconnects has been of paramount importance for the parallel and high-performance computing technologies. Code-division multiple access (CDMA) is another medium sharing technique that leverages the code space to enable simultaneous medium access. In CDMA channels, each transmit–receive (TX-RX) pair is assigned a unique bipolar spreading code and data spread from all transmitters are summed in an additive communication channel. The spreading codes in classical CDMA systems are orthogonal—cross correlation between orthogonal codes is zero—which enables the CDMA receiver to properly decode the received sum via a correlator decoder. Classical CDMA systems rely on Walsh–Hadamard orthogonal codes[23] to enable medium sharing. CDMA has been proposed as an on-chip interconnect sharing technique for both bus and NoC interconnect architectures. Many advantages of using CDMA for on-chip interconnects include reduced power consumption[3], fixed communication latency[2], and reduced system complexity[12].

The main resource sharing techniques adopted by existing NoC crossbars are time-division multiple access (TDMA), where the physical link is time shared between the interconnected PEs [9], and space-division multiple access (SDMA), where a dedicated link is established between every pair of interconnected PEs [10]. The physical layer of an NoC router also contains buffering and storage devices [7]. A CDMA switch has less wiring complexity than an SDMA crossbar and less arbitration overhead than a TDMA switch, and thus provides a good compromise of both. However, only basic features of the CDMA technology have been explored in the on-chip interconnect literature[11]. NoCs provide a scalable solution for large SoCs, but they exhibit increased power consumption and large resource overheads . The NoC layering model splits the transaction into four layers: 1) application; 2) transport; 3) network; and 4) physical layers[8]. A crossbar is the basic building block of the NoC[8] physical layer. A cross-bar switch is a shared communication medium adopting a multiple access technique to enable physical packet exchange.

Overloaded CDMA interconnect (OCI) crossbar architecture to increase the CDMA router capacity by 100% at marginal cost. Crossbar overloading relies on exploiting special Properties of the used orthogonal spreading code set, namely, Walsh–Hadamard codes, to add a set of non orthogonal spreading codes that can be uniquely identified on the receiver side.

The contributions of this paper are as follows

- 1) Introduce two novel approaches that can be deployed in CDMA NoC crossbars to increase the router capacity and, consequently, bandwidth by 100%[14] at marginal cost.
- 2) Present the OCI mathematical foundations, spreading code generation procedures, and OCI-based router architectures.

3) Develop and evaluate the OCI-based routers built on a Xilinx Artix-7 AC701 evaluation kit and using a 65-nm ASIC technology for several synthetic traffic pat-terns and compare their latency, bandwidth, and power consumption with the basic CDMA and SDMA switching topologies.

# II. RELATED WORK

Utilizing CDMA as a medium access conspire in crossbar switches gives ideal qualities like the settled exchange dormancy and low mediation overhead. Nikolic et al.[16] have proposed a versatile CDMA-based fringe transport to diminish the quantity of parallel exchange lines and point-to-point (PTP) transports and to stay away from the overhead of TDMA referees. This approach decreases the stick check when utilized at the interface of various peripherals to different PEs since the information from the peripherals are included and transmitted less lines. The expansion in the exchange dormancy because of information spreading is worthy since peripherals as a rule work at bring down frequencies than the ace PEs. A master– slave transport wrapper has been exhibited in[17] and[18], where the information are packaged and spread utilizing orthogonal CDMA codes to diminish the quantity of parallel exchange lines. The control signals are not encoded to encourage interconnection to other TDMA transports.

The CDMA-based switch allows simultaneous packet transmission due to code-space multiplexing. This approach reduces the hop count in multicasting schemes and allows packets to reach the destination PEs[10] simultaneously, which is preferred in real-time applications. The assignment of spreading codes[21] to TX-RX pairs is dynamic based on the request from each node. Two architectures have been introduced in the CDMA-based network: a serial CDMA network, where each data chip in the spreading code is sent in one clock cycle, and a parallel CDMA network, where all data chips are sent in the same cycle. The CDMA-based serial and parallel networks have been compared with a conventional CDMA network, a mesh-based NoC[20],[26], and a TDMA bus. For the same network area, the bandwidth of the parallel CDMA network is higher than the throughput of the mesh-based NoC and the TDMA bus due to the simultaneous medium access nature of CDMA.

# III. EXISTING METHOD:

#### A. Overloaded CDMA in Wireless Communications.

Direct sequence spread spectrum CDMA (DSSS-CDMA) is a main approach for medium partaking in remote communications where an arrangement of orthogonal spreading codes made out of a surge of chips of length N [24]are duplicated by the transmitted information bits to such an extent that every datum bit is spread in N cycles. A one of a kind spreading code is relegated to each TX-RX combine sharing the correspondence channel. Information floods of clients sharing the channel are spread and all the while transmitted to an added substance correspondence channel. De spreading is accomplished by applying the connection operation to the got whole, where every recipient can separate its information by relating it with the doled out spreading code. Only digital on-chip interconnects are considered, random effects arising in analog communication channels such as noise, fading, and MAI can be efficiently mitigated using error detection and correction techniques [27]. Therefore, such random effects are neglected in this paper.



Fig. 1 (a)CDMA NoC router architecture. (b) Classical CDMA crossbar.

#### **B.** Classical CDMA Crossbar Switch

Fig. 1(a) illustrates the high-level architecture of a CDMA-based NoC[22] router. The physical layer of the router is based on the classical CDMA switch.

The switch is composed of a number of XOR encoders, a channel adder, and accumulator-based decoders. In the encoder, an Nchip length binary orthogonal code, generated from a Walsh spreading code set, is XOR ed with the transmitted data bit and sent out serially, indicating that a single bit is spread in a duration of N clock cycles. Therefore, the crossbar transaction frequency ft and operating clock frequency fc are related as ft = fc/N.

$$S(i) = \sum_{j=1}^{M} d(j) C_o(j, i) \qquad -(1)$$

where S(i) is an m-bit binary number representing the channel sum at the i th clock cycle, the crossbar width  $m = \log 2 M$ , d(j) is the data bit from the j th encoder, Co(j, i) is the i th chip of the j th orthogonal spreading code, and  $\bigoplus$  is the XOR operation. In the ordinary CDMA crossbar, the adder has M = N - 1 input bits and  $m = \log 2 M = \log 2 N$  output bits.

The main difference between the overloaded and classical CDMA routers is that M > N - 1 for the former due to channel overloading.

#### **OCI Crossbar High-Level Architecture**

The main objective of this paper is increasing the number of ports sharing the ordinary CDMA crossbar presented in [15], while keeping the system complexity unchanged using simple encoding circuitry and relying on the accumulator decoder with minimal changes. To achieve this goal, some modifications to the classical CDMA crossbar are advanced. Fig. 2 depicts the high-level architecture of the OCI crossbar for a single-bit interconnection. The same architecture is replicated for a multi bit CDMA router. M TX-RX ports share the CDMA router, where spread data from the transmit ports are added using an arithmetic binary adder having M binary inputs and an m-bit output, where  $m = \log 2 M$ .

#### **OCI Code Design**

The Walsh–Hadamard spreading code family has a featured property that enables CDMA interconnect overloading. The difference between any consecutive channel sums of data spread by the orthogonal spreading codes for an odd number of TX-RX pairs M is always even, regardless of the spread data. This property means that for the N – 1 TX-RX pairs using the Walsh orthogonal codes, one can encode additional N – 1 data bits in consecutive differences between the N chips composing the orthogonal code. Thus, exploiting this property enables adding 100% non orthogonal spreading codes, which can double the capacity of the ordinary CDMA crossbar.



Fig. 2. High-level architecture and building blocks of the OCI crossbar.

#### **OCI Crossbar Building Blocks**

Two variants are realized for each OCI crossbar, reference and pipelined architectures. The pipelined architecture is implemented to increase the crossbar operating frequency, and consequently, bandwidth by adding nonfunctional pipelining registers to reduce the crossbar critical path. The OCI crossbar shown in Fig. 2 is basically composed of three main building blocks: 1) the encoder wrappers[23]; 2) the decoder wrappers; and 3) the crossbar adder blocks, which are described in the following.

## 1) Crossbar Controller:

Toward the start of every crossbar exchange, the controller relegates spreading codes to various encoders. The task of orthogonal de spreading codes to get ports is settled, i.e., does not change between the crossbar exchanges. In this way, for a switch port to start the correspondence with the get port it addresses, its encoder must be doled out a spreading code that matches the predetermined decoder. In the event that two distinct ports demand to address a similar decoder, the controller permits one access and suspends the other as per a predefined mediation conspire. This code task conspire is called recipient based convention.

#### 2) Hybrid Encoder:

The encoder is hybrid, it can encode both orthogonal and non orthogonal data. A transmitted data bit is XOR ed/AND ed with the spreading code to pro-duce the orthogonal/non orthogonal spread data, respectively. A multiplexer chooses between the orthogonal and non orthogonal inputs according to the code type assigned to the encoder as depicted by Fig. 3(a). The encoder is replicated N times for the P-OCI crossbar.





## 3) Crossbar Adder:

Fig3 (b) T-OCI pipelined crossbar tree adder, in which the adder is replicated N times for P-OCI crossbar.

For a spreading code set of length N, the number of crossbar TX-RX ports is equal to M = 2(N - 1). In the T-OCI crossbar, sending a "1" chip to the adder is mutually exclusive between non orthogonal transmit ports according to the T-OCI encoding scheme.

This indicates that among the 2(N-1) inputs to the adder, there are guaranteed (N-2) zeros, while the maximum number of "1" chips is N. Therefore, a multiplexer is instantiated to select only a single input of the non orthogonal TDMA encoded data bits and discard the remaining bits that are guaranteed to be "0." In place of adder we are using carry select adder CSA[28]is a particular way to implement an adder, which is a logic element that computes the (n+1) bit sum of 2-n bit numbers.

#### 4) Custom Decoder:

There are four decoder types for different CDMA decoding techniques: the orthogonal T-OCI and P-OCI decoders and the overloaded T-OCI and P-OCI decoders. The orthogonal T-OCI decoder is an accumulator implementation of the correlator receiver. N - 1 accumulator decoders are instantiated in all CDMA crossbar types for orthogonal data de spreading. Instead of implementing two different accumulators (the zero and one accumulator), an up–down accumulator is implemented and the accumulated result is the difference between the two accumulators of the conventional CDMA decoder as shown in Fig 3(c).



Fig3(c) P-OCI orthogonal decoder.

|    | Notation           | Description                                     |        |
|----|--------------------|-------------------------------------------------|--------|
|    | N                  | Orthogonal spreading code length                |        |
|    | M                  | Number of interconnected ports                  |        |
|    | m                  | Number of crossbar adder wires                  |        |
|    | S                  | Sum of CDMA chips carried by the channel        |        |
|    | d <sub>c</sub>     | Data bit encoded by an orthogonal CDMA code     |        |
|    | d <sub>T</sub>     | Data bit encoded by an non orthogonal TDMA code |        |
| Q. | C <sub>o</sub> (j) | The jth chip of orthogonal CDMA code            | Ì      |
|    | T(j)               | The jth chip of non orthogonal TDMA code        | r<br>M |
|    | $C_n$              | TDMA MAI code (non-orthogonal spread data)      | Q      |
|    |                    |                                                 |        |

Table 1: Definition of Notations

## IV. PROPOSED METHOD.

In the extension method to improve the performance of the OCI crossbar Han-Carlson adder[27] is used in the place of parallel adder architecture which resulted in increase in the speed of the OCI crossbar. Han-Carlson Topology:

A. Prefix Addition

The binary addition problem can be formulated as follows: given n-bit augends  $A=a_{n-1,}a_{n-2...}a_{0,}$  and an n-bit addend  $B=b_{n-1,}b_{n-2,}b_{0,}$  generate the n-bit sum  $S=s_{n-1,}s_{n-2,...}s_0$  Give us a chance to show as Ci the carry out of the i-th bit The whole piece Si and the carry Ci can be computed as follows.

$$s_{i=}a_{i}\oplus b\oplus_{i}c_{i-1} (2) c_{i=}a_{i}b_{i} + a_{i}c_{i-1} + b_{i}c_{i-1} (3)$$

In prefix addition we use three stages to register the sum- pre-processing, prefix-processing and post-preparing. In the pre-handling stage the generate gi and propagate Pi signal are figured as:

$$g_i = a_i \bullet b_i \tag{4}$$

The condition  $g_i = 1$  means that a carry is generated at bit i. While the condition  $P_i=1$  means that a carry is propagated through bit i . The concept of generate and propagate can be extended to a block of contiguous bits, from bit k to bit i (with k< i) as follows:

$$g[i:k] = \begin{cases} gi & if \ i = k \\ g[i:j] + p[i:j]g[l:k] & otherwise \end{cases}$$
(6)

$$p[i:k] = \begin{cases} pi & if \ i = k \\ p[i:j]p[l;k] & otherwise \end{cases}$$
(7)

Where:  $i \ge l \ge j \ge k$ 

The condition g[i:k] means that a carry is generated in the block k-1, while the condition p[i:k] means that a carry is propagated through the block. Thus, for any bit i the carry  $C_i$  can be expressed as:

$$Ci=g[i:0]+p[i:0]c-1$$
 (8)

Where C-1 is the information convey of the n-bit adder. In the accompanying, for straight forwardness, we accept that Ci-1=0, so that above equation 4 follows as: Ci=g[i:0]

`The block generate and propagate terms are registered in the prefix-preparing phase of the adder. To that reason, the (g[i:k], p[i:k]) couples are communicated with the assistance of the prefix operator characterized as takes follows

 $(g[i:k],p[i:k]) = (g[i:j],p[i:j]) \bullet (g[l:k],p[l:k]) = (g[i:j]+p[i:j]g[l:k],p[i:j]p[l:k])$ (9)



#### V. RESULTS

The adders we described in following discussions are CSA (Carry Select Adder) and HCA (Han Carlson Adder). Compared to CSA, by using Hancarlson adder in proposed system the delay and area also reduced compared to the existing system as shown in table 2, the delay was reduced to almost 50% and number of 4-inpu Look Up Tables reduced so area decreased.

|                    | CSA      | НСА      |
|--------------------|----------|----------|
| Delay              | 15.622ns | 7.577ns  |
| no.of slices       | 25 (2%)  | 8 (0%)   |
| no.of 4 input LUTs | 43 (2%)  | 16 (0%)  |
| no.of bonded IOBs  | 50 (75%) | 50 (75%) |

Table 2: comparison table of Han-Carlson Speculative adder and carry Select adder



Fig 5: overall comparision of existing and proposed system.





Fig 6: simulation and synthesis results



# VI. CONCLUSION

In this paper, we introduced the concept of area-time efficient OCI crossbars as the physical layer enabler of NoC routers. We exploited High speed low area adders in proposed system. Two crossbar architectures that leverage the overloaded CDMA concept, namely, T-OCI and P-OCI, are advanced to increase the CDMA crossbar capacity by 100% and  $2N \times 100\%$ , respectively, where N is the spreading code length. We exploited featured properties of the Walsh spreading code family employed in the classical CDMA crossbar to increase the number of router ports sharing the crossbar without altering the simple accumulator decoder architecture of the conventional CDMA crossbar. The T-/P-OCI crossbars with Han-Carlson adder is implemented and validated on a Xilinx ISE design Suite. The use of Han-Carlson adder gives better performance than the existing system by 38% area reduced and49% speed increased.

# REFERENCES

[1] K. Asanovic *et al.*, "The landscape of parallel computing research: A view from berkeley," Dept. EECS, Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2006-183, 2006.

[2] P. Bogdan, "Mathematical modeling and control of multifractal workloads

for data-center-on-a-chip optimization," in Proc. 9th Int. Symp. Netw.-Chip, New York, NY, USA, 2015, pp. 21:1-21:8.

[3] Z. Qian, P. Bogdan, G. Wei, C.-Y. Tsui, and R. Marculescu, "A trafficaware adaptive routing algorithm on a highly reconfigurable network-onchip architecture," in *Proc. 8th IEEE/ACM/IFIP Int. Conf. Hardw./Softw.Codesign, Syst. Synth.*, New York, NY, USA, Oct. 2012, pp. 161–170.

[4] Y. Xue and P. Bogdan, "User cooperation network coding approach for NoC performance improvement," in *Proc. 9th Int. Symp. Netw.-Chip*, New York, NY, USA, Sep. 2015, pp. 17:1–17:8.

[5] T. Majumder, X. Li, P. Bogdan, and P. Pande, "NoC-enabled multicore architectures for stochastic analysis of biomolecular reactions," in *Proc.Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, San Jose, CA, USA, Mar. 2015, pp. 1102–1107.

[6] S. J. Hollis, C. Jackson, P. Bogdan, and R. Marculescu, "Exploiting emergence in on-chip interconnects," *IEEE Trans. Comput.*, vol. 63, no. 3, pp. 570–582, Mar. 2014.

[7] S. Kumar *et al.*, "A network on chip architecture and design methodology," in *Proc. IEEE Comput. Soc. Annu. Symp. (VLSI)*, Apr. 2002, pp. 105–112.

[8] T. Bjerregaard and S. Mahadevan, "A survey of research and practices of network-on-chip," ACM Comput. Surv., vol. 38, no. 1, 2006, Art. no. 1.

[9] Y. Xue, Z. Qian, G. Wei, P. Bogdan, C. Y. Tsui, and R. Marculescu, "An efficient network-on-chip (NoC) based multicore platform for hierarchical parallel genetic algorithms," in *Proc. 8th IEEE/ACM Int.* 

Symp. Netw.-Chip (NoCS), Sep. 2014, pp. 17–24.

[10] D. Kim, K. Lee, S.-J. Lee, and H.-J. Yoo, "A reconfigurable crossbar switch with adaptive bandwidth control for networks-onchip," in *Proc.IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2005, pp. 2369–2372.

[11] R. H. Bell, C. Y. Kang, L. John, and E. E. Swartzlander, "CDMA as a multiprocessor interconnect strategy," in *Proc. Conf. Rec.* 35th Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2001, pp. 1246–1250.

[12] B. C. C. Lai, P. Schaumont, and I. Verbauwhede, "CT-bus: A heterogeneous CDMA/TDMA bus for future SOC," in *Proc. Conf. Rec. 35th Asilomar Conf. Signals, Syst. Comput.*, vol. 2. Nov. 2004,

pp. 1868–1872.

[13] S. A. Hosseini, O. Javidbakht, P. Pad, and F. Marvasti, "A review on synchronous CDMA systems: Optimum overloaded codes, channel capacity, and power control," *EURASIP J. Wireless Commun. Netw.*,

vol. 1, pp. 1–22, Dec. 2011.AHMED et al.: OVERLOADED CDMA CROSSBAR FOR NETWORK-ON-CHIP 1855

[14] K. E. Ahmed and M. M. Farag, "Overloaded CDMA bus topology for MPSoC interconnect," in *Proc. Int. Conf. ReConFigurable Comput. FPGAs (ReConFig)*, Dec. 2014, pp. 1–7.

[15] K. E. Ahmed and M. M. Farag, "Enhanced overloaded CDMA interconnect (OCI) bus architecture for on-chip communication," in *Proc. IEEE 23rd Annu. Symp. High-Perform. Interconnects (HOTI)*, Aug. 2015, pp. 78–87.

[16] T. Nikolic, G. Djordjevic, and M. Stojcev, "Simultaneous data transfers over peripheral bus using CDMA technique," in *Proc.* 26th Int. Conf. Microelectron. (MIEL), May 2008, pp. 437–440.

[17] T. Nikolic, M. Stojcev, and G. Djordjevic, "CDMA bus-based onchip interconnect infrastructure," *Microelectron. Rel.*, vol. 49, no. 4, pp. 448–459, Apr. 2009.

[18] T. Nikoli'c, M. Stoj'cev, and Z. Stamenkovi'c, "Wrapper design for a CDMA bus in SOC," in *Proc. IEEE 13th Int. Symp. Design Diagnostics Electron. Circuits Syst. (DDECS)*, Apr. 2010, pp. 243–248.

[19] J. Kim, I. Verbauwhede, and M.-C. F. Chang, "Design of an interconnect architecture and signaling technology for parallelism in communication," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 8, pp. 881–894, Aug. 2007.

[20] X. Wang, T. Ahonen, and J. Nurmi, "Applying CDMA technique to network-on-chip," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 15, no. 10, pp. 1091–1100, Oct. 2007.

[21] W. Lee and G. E. Sobelman, "Mesh-star hybrid NoC architecture with CDMA switch," in *Proc. IEEE Int. Symp. Circuits Syst.* (*ISCAS*), May 2009, pp. 1349–1352.

[22] B. Halak, T. Ma, and X. Wei, "A dynamic CDMA network for multicore systems," *Microelectron. J.*, vol. 45, no. 4, pp. 424–434, Apr. 2014.

[23] J. Wang, Z. Lu and Y. Li, "A New CDMA Encoding/Decoding Method for on-Chip Communication Network," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 4, pp. 1607–1611, Apr. 2016.

[24] H.-H. Chen, The Next Generation CDMA Technologies. Hoboken, NJ, USA: Wiley, 2007.

[25] J. Postman and P. Chiang, "A survey addressing on-chip interconnect: Energy and reliability considerations," *ISRN Electron.*, vol. 2012, pp. 1–9, 2012

[26] S. Mubeen and S. Kumar, "Designing efficient source routing for mesh topology network on chip platforms," in *Proc. 13th Euromicro Conf. Digit. Syst. Design, Archit., Methods Tools (DSD)*, Sep. 2010,

pp. 181–188.

[27] T. Han and D. Carlson, .Fast area-efficient VLSI Adders, In Proc. 8th Symp.Comp. Arithmetic, Sept. 1987, pp. 49.56.

[28] Bedrij, Orest J. "Carry-select adder." IRE Transactions on Electronic Computers 3 (1962): 340-346.