# VLSI ARCHITECTURE FOR CONTRAST ENHANCEMENT USING PIXEL BASED IMAGE FUSION

## <sup>1</sup>N.Monika, <sup>2</sup>B.K.N.Srinivasarao, <sup>3</sup>M.Vijayakumar <sup>1</sup>M.Tech (DECS), <sup>2</sup>Professor, <sup>3</sup>Assistant professor <sup>1</sup>Deptartment of ECE, Gudlavalleru Engineering College, Gudlavalleru, India-521356

*Abstract* : In the past few years, image fusion has become a very popular field in the area of image processing. Due to the fast entrance of digital imaging into the remote sensing and satellite applications, there is often a need to store large amount of image data and process it very quickly. The standard fusion methods perform well spatially but introduce spatial distortions. So to overcome this problem we focus on the most powerful tool for image processing i.e. DWT. In this paper we use an efficient DWT based image fusion by using lifting based scheme. For the processing of high resolution (HD & UHD) images requires high complexity systems and consumes larger time, to avoid such latency we use hardware processing. The VLSI architecture for contrast enhancement using pixel based image fusion is designed for the low contrast source images and is simulated using VERILOG hardware description language.

*Index terms*: Image Fusion, Discrete Wavelet Transform (DWT), 2-D DWT, lifting based DWT, VLSI Architecture, contrast enhancement, strip-based scanning, Gamma Correction, lifting based IDWT, 2-D IDWT.

## **I.INTRODUCTION**

Image Fusion with multiple image sources has now a day's been very popular in many fields such as medical imaging, computer vision, paintings, remote sensing, and in many applications. The human visual system can identify various edges, colors, contrast etc. in an image, but some devices are now with limited set of colors and not pleasing to human eyes. So to get an image with sharp and crisp, image fusion technique is used in which two or more images are combined to get an image with more details from the same source images. The implementation of VLSI architecture for efficient 2-D DWT and inverse 2-DWT processor is required, that consumes less area, memory efficient and should operate with a high frequency to use in real-time applications. The implementation of lifting based scheme (DWT) requires less arithmetic complex, less memory and can be implement in parallel. Huang et al.,[7] introduced a flipping structure which provides a critical path of Tm+Ta. Now there is a requirement to improve memory efficiency. Various parallel architecture for 2-D DWT. The on-chip memory for this is 3N+24P of image with P parallel Processing Units (PU). The architecture designed for 2-D IDWT is similar to that of 2-D DWT but the operation takes place in reverse direction.

## **II. DESIGN OF ARCHITECTURE**

The design of the proposed architecture for the contrast enhancement of the fused image obtained by pixel based image fusion in wavelet domain is shown as below in Fig.1.





## 2.1 Proposed architecture for 2-D DWT and 2-D inverse DWT:

**2-D DWT:** The architecture consists of two parallel spatial processors (2-D DWT) i.e., P=2 for which each processor takes a source image (low contrast image) of same type each. The input frames applied to the 2-D DWT modules gives the 4 sub-bands of outputs each i.e. LL1, HL1, LH1, HH1, LL2, HL2, LH2, and HH2 respectively. These results are given for maximum fusion rule module. The output results of 2-D IDWT are LL3, HL3, LH3, and HH3.

#### A. Architecture for spatial processor:

The architecture of 2-D DWT is a memory efficient lifting based 2-D DWT spatial processor which consists of a row and column processors. Y.Hu et al.[5] proposed an architecture which utilizes strip-based scanning to enable the trade-off between external and internal memory. Flipping model is used for the processing element (PE). The processing unit (PU) is designed which is having five PEs. The proposed architecture performs the lifting based (9/7) 1-D DWT process by PU. The PU which is modified has a CPD to *Ta* (adder delay). The width of the strip is equal to the number of inputs to the spatial processor i.e. 2P+1. The PUs with different pipeline stages is shown in Fig.2. The proposed architecture consists of two parallel processing units (PUs) i.e. P = 2. When the internal results of the row processor are obtained then the column processor starts to process on those intermediate results.

1) Row processor (RP): For the strip-based scanning the image is of size  $N \times N$  is extended to  $N \times (N+1)$  by one column by symmetric extension. The strip based scanning on DWT takes place in row wise through row processor (RP) as well as column wise through the column processor (CP). In this design we consider P=2. The pixels come from X (0, 0) to X (0,2P) simultaneously for the first clock cycle.

Then it goes for the next strip i.e. pixels from X(0,2P) to X(0,4P) of the next row are taken and the same procedure continues for the whole image. From the eqn. (1) PE\_alpha of PU-1 provides the output H1[n] along with produces partial results X'[2n]that is required for PE\_beta. To reduce CPD the pipeline stages are increased to giving CPD as Ta, where the results are stored in the memories Memory\_alpha, Memory\_beta and Memory\_gama respectively. The process of scaling is taken place for each PU and then outputs of these are fed to the transposing unit.

2) Column processor (CP): In our architecture we design the CP with two number of PUs to match with the throughput of RP. Pair of H and L is produced by transpose register in an alternative order and are fed to the CP as inputs of one PU. Four sub-bands are generated at the output of CP in an interleaved pattern. The outputs obtained from CP are given to the re-arrange unit, thus giving outputs in sub-band order i.e., *LL*, *LH*, *HL*, *HH* simultaneously.

The design equations of 2-D DWT are as shown below

$$H_1[n] \leftarrow a' * X[2n-1] + \{X[2n] + X[2n-2]\},...,P1$$
 (1)



**Fig.2.** (a) Data flow graph of processing unit (b) Processing unit with five pipeline stages (c) Processing unit with nine pipeline stages.

$$H_{2}[n] \leftarrow c' * H_{1}[n] + \{L_{1}[n] + L_{1}[n-1]\}.....P2$$
(3)

$$H[n] \leftarrow K0 * \{H_2[n]\} \tag{5}$$

$$L[n] \leftarrow K1 * \{L_2[n]\} \tag{6}$$

Where  $a'=1/\alpha$ ,  $b'=1/\alpha\beta$ ,  $c'=1/\beta\gamma$ ,  $d'=1/\gamma\delta$ ,  $K0=\alpha\beta\gamma\delta\zeta$  and  $K1=\alpha\beta\gamma\delta\zeta$ . The lifting and the scaling coefficients are  $\alpha,\beta,\gamma,\delta$  and  $\zeta$  respectively. Its values are  $\alpha = -1.586134342$ ,  $\beta = -0.052980118$ ,  $\gamma = 0.08829110762$ ,  $\delta = 0.4435068522$ , and  $\zeta = 1.149604398$ .

**2-D Inverse DWT:** The architecture designed for 2-D Inverse DWT is similar to that of 2-D DWT but the operation takes place in reverse direction. The 2-D IDWT requires an on-chip memory of 4N+86 words for a single level decomposition. The 2-D IDWT processing starts firstly column wise and then through the row wise.

#### A. Architecture for spatial processor (SP):

Flipping model here is used to reduce the longest path delay into shorter critical path. The PU architecture reduces the CPD to Ta (adder delay). The outputs that are obtained from the row processor are the pixels that belong to the reconstructed image. These output pixels are stored in the external memory. The 2-D IDWT utilizes pipeline structure with different stages. Huang et al. [7] Modified equations for flipping based IDWT. They are as shown below:

$$\begin{cases} \delta'L(n) = 1.9843L(n) \\ \delta''L(n) = 0.2812L(n) \end{cases}$$

$$I^{st} stage (7)$$

$$L'(n) = (\delta'L(n) - \delta''L(n)) - (H(n) + H(n - 1))) \\ \gamma'H(n) = 0.0625H(n) \\ \gamma''H(n) = 2.5H(n) \end{cases}$$

$$P''H(n) = 2.5H(n) \qquad 2^{nd} stage (8) \\P''H(n) = 2.5H(n) \qquad 3^{rd} stage (9) \\\beta''L(n) = 10.625L'(n) \\\beta''L(n) = 32L'(n) \qquad 3^{rd} stage (9) \\\beta''L(n) = 32L'(n) \qquad 4^{th} stage (10) \\P''(n) = 12H'(n) \qquad 4^{th} stage (10) \\P'(2n - 1) = 0.03125(\alpha H'(n) - (X'(2n) + X'(2n - 1))) \end{cases}$$

*1. Column Processor:* The column processor is designed with two Processing Units (PUs). It receives the 2-D DWT output coefficients (LL, LH, HL, and HH sub-bands) as inputs. It takes the inputs for the first clock cycle from LL(n,m), LH(n,m), HL(n,m), HH(n,m) simultaneously and the same procedure continues for each clock cycle until the bottom row. The PU of this consists of five pipeline stages and each pipeline stage is processed by the processing element (PE) i.e. PE\_shift, PE\_invdelta, PE\_invgama, PE\_invbeta, PE\_alpha as shown in the Fig.3(a).

2. *Row processor:* The generalized structure for row processor is as shown in Fig.3(b). For each clock cycle the CP sends the inputs to the RP from XL(n-1,m) and XL(n,m) from low frequency sub-band and XH(n-1,m) and XH(n,m) from high frequency sub-band. The intermediate results that are obtained when input from CP XH (n-1, m) are sent i.e. (XL' (n-1, m), XH' (n-1, m), and X' (n-1, 2m)) belonging to PE\_invgama, PE\_invbeta, PE\_invalpha of PU 1 stores in the memories i.e. Mem\_delta1, Mem\_invgama1, Mem\_invbeta1, Mem\_invalpha1 respectively. Similarly for the other inputs of PU2 the same procedure takes place.

**Image fusion:** The resulting outputs of 2-D IDWT are now applied for the inverse 2-D DWT module for the reconstruction of the final fused image. For contrast enhancement the gamma correction ( $\gamma$ ) is applied for the final fused image.



Fig.3. (a) Structure of column processor (b) structure of row processor.

## **III. IMPLEMENTATION RESULTS:**

Two low contrast source images are taken and DWT based image fusion is applied. The contrast enhancement of the final fused image is obtained for different gamma correction values.

| Input source<br>image1 | Input source<br>image 2 | Fused output<br>image | Gamma(y)<br>corrected image |
|------------------------|-------------------------|-----------------------|-----------------------------|
| GEC Building:          |                         |                       |                             |
|                        |                         |                       |                             |
| LENA:                  |                         |                       |                             |
|                        |                         |                       |                             |
| PLANES:                |                         |                       |                             |
| S.                     | - Andrew -              | J.                    |                             |
| - Martin               | -5                      |                       |                             |

 Table 1 VLSI architecture implementation results.

## **IV. COMPARISON TABLES:**

| Parameters          | Zhang             | Mohanty                   | Darji     | Yusong             | Proposed                  |
|---------------------|-------------------|---------------------------|-----------|--------------------|---------------------------|
|                     |                   |                           |           |                    |                           |
| Multipliers         | 10                | 9P                        | 10        | 10P                | 0                         |
| Adders              | 16                | 16P                       | 16        | 16P                | 34P                       |
| Internal memory     | 4N+37             | 15P+5.5N                  | 4N        | 24P+3N             | 60P+3N                    |
| Critical path delay | $T_m$             | $T_m + 2T_a$              | $T_m$     | $T_m + T_a$        | T <sub>a</sub>            |
| Computation time    | N <sup>2</sup> /2 | <i>N</i> <sup>2</sup> /2P | $N^{2}/2$ | N <sup>2</sup> /2P | N <sup>2</sup> /2P        |
| Throughput          | $2/T_m$           | $2P/T_m + 2T_a$           | $2/T_m$   | $2P/T_m + T_a$     | 2P/ <i>T</i> <sub>a</sub> |

**Table 2** Comparison of proposed 2-D DWT architecture with existing architectures:

Table 3 Comparison of proposed 2-D inverse DWT architectures with existing architectures:

| Parameters          | Darji          | P.K.Nath        | Proposed                    |
|---------------------|----------------|-----------------|-----------------------------|
| Memory requirement  | 4N+5 registers | 5N              | 4N+86 registers             |
| Throughput/cycle    | 2              | 2               | 4                           |
| Compution time      | $N^2/2+N+3$    | $N^2/2+2.5N+26$ | <i>N</i> <sup>2</sup> /4+18 |
| Multipliers         | 10             | Nil             | Nil                         |
| Critical path delay | $T_m$          | T <sub>a</sub>  | T <sub>a</sub>              |
| Adders              | 16             | 58              | 76                          |

#### V. CONCLUSION:

Specialized hardware results in improvement of speed and with less complexity. Its architecture operates at 186.03 MHz frequency and it uses 734 slice LUTs for 2-D DWT and similarly 2-D inverse DWT uses 3462 slice LUTs at 209.102MHz frequency. The 2-D DWT and 2-D inverse IDWT spatial processors used in this architecture is efficient than the other existing architectures. It reduces internal memory, latency, critical path delay and complexity of the control unit.

#### **REFERENCES:**

[1] Sunpreet Sharma, Ju Jia Zou and Gu Fang, "Contrast Enhancement Using Pixel Based Image Fusion in Wavelet Domain" 978-1-5090-5256- 1/16/\$31.00-c 2016 IEEE.

[2] B. K. N. Srinivasarao and I. Chakrabarti, "High performance VLSI architecture for 3-D discrete wavelet transform," in proc. of IEEE 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp.1-4, 25-27 April 2016, Hsinchu, Taiwan.

[3] A.Darji, S. Shukla, S. N. Merchant and A. N. Chandorkar, "Hardware Efficient VLSI Architecture for 3-D Discrete Wavelet Transform," Proc. Of 27<sup>th</sup> Int. Conf. on VLSI Design and 13<sup>th</sup> Int.Conf.on Embedded systems pp.348-352,5-9 Jan.2014.

[4] B. K. Mohanty and P. K. Meher, "Memory –Efficient architecture for 3-D DWT using overlapped Grouping of Frames," IEEE Transactions on signal processing, Vol.59 No. 11, PP.5605-5616, Nov. 2011.

[5] Y. Hu and C. C. Jong, "A Memory-Efficient Scalable Architecture for Lifting-Based Discrete Wavelet Transform,"IEEE Transactions on Circuits and Systems-II: Express Briefs, VOL. 60, NO. 8, pp. 502-506, Aug. 2013.

[6] P. K. Nath and S.Banerjee, "A high speed, memory efficient line based VLSI architecture for the dual mode inverse discrete wavelet transform of JPEG2000 decoder," microprocessors and Microsystems, vol.40,pp. 181-188,2015.

[7] C.-T. Huang, P.-C. Tseng, and L.-G. Chen, "Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform," IEEE Trans.Signal Process., vol. 53, no. 4, pp. 1575-1586, Apr. 2005.