# 7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

Yong-Min Jung<sup>1</sup>, Chul-Ho Chung<sup>1</sup>, Yun-Ho Jung<sup>2</sup>, and Jae-Seok Kim<sup>1</sup>

Abstract—This paper proposes a high-throughput encoding process and encoder architecture for quasicyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.

*Index Terms*—Accumulation, high-throughput, IEEE 802.11ac, partially parallel process, QC-LDPC codes

## **I. INTRODUCTION**

Quasi-cyclic low-density parity-check (QC-LDPC) codes defined by a sparse parity check matrix have received much attention as a forward error correction

code due to their excellent error correction performance [1]. Some wireless communication standards, such as IEEE 802.11ac and IEEE 802.16e, adopt the QC-LDPC codes as an error correction code [2, 3]. These wireless communication standards support a very high data rate over hundreds of Mbps. IEEE 802.11ac system especially supports about 7 Gbps data rate. These standards also support various code rates and codeword block lengths. Therefore, the QC-LDPC encoder is required, which is rate compatible and provides high throughput.

In order to achieve high throughput, an encoding process has to be performed in the small number of clock cycles. In many literatures, the encoders for QC-LDPC codes were presented [4-13]. The LDPC encoders presented in [4, 5, 7-11] can support various code rates and codeword block lengths. However, these encoders cannot support the high data rate over Gbps. The LDPC encoder presented in [12] can provide 3.34 Gbps throughput. In [13], four types of rotate-left-accumulator circuits were considered for efficient QC-LDPC encoder. However, an entire encoding process for the highthroughput encoder was not presented. The LDPC encoders introduced in [14, 15] are designed for the LDPC code of IEEE 802.11n standard. Although the LDPC encoder in [14] can perform encoding with the small number of clock cycles, a fully parallel architecture causes high hardware complexity. Since the LDPC encoder in [15] requires many clock cycles for encoding, it is hard to achieve Gbps throughput.

The parity check matrix of QC-LDPC code is composed of  $Z \times Z$  square sub-matrices. Each sub-matrix is an identity matrix with a cyclic shift or a zero matrix. During an encoding process, information bit sequence is

Manuscript received Dec. 14, 2013; accepted Jun. 14, 2014 <sup>1</sup> School of Electrical & Electronics Engineering, Yonsei University, Seoul. 120-749. Korea

<sup>&</sup>lt;sup>2</sup> School of Electronics, Telecommunication and Computer Engineering, Korea Aerospace University, Goyang-si, 412-791, Korea E-mail : jaekim@yonsei.ac.kr

multiplied by the parity check matrix. Since the parity check matrix is divided into several  $Z \times Z$  square submatrices, one  $Z \times 1$  sub-sequence of the information bits is multiplied by the  $Z \times Z$  square sub-matrix. Due to the cyclic shift property of the sub-matrix, the multiplication can be implemented by a cyclic shifter. In the parallel architecture for achieving high throughput, many cyclic shifters are required. Thus, low-complexity design of the cyclic shifter is important issue in the design of high-throughput QC-LDPC encoder.

In this paper, we propose a high throughput QC-LDPC encoder by adopting the column-direction partially parallel process algorithm, which was proposed in our previous works [16, 17]. Based on the partially parallel process, we propose high throughput QC-LDPC encoder architecture. The goal of the encoder design is to provide very high throughput up to 7 Gbps with low complexity in IEEE 802.11ac systems. In order to achieve high throughput, the proposed architecture is design to complete the encoding with the small number of clock cycles. To reduce the hardware overhead caused by the parallel process, a low complexity parallel cyclic shifter is proposed. The proposed encoder is also rate compatible to support various code rates and codeword block lengths. Implementation results demonstrate that the proposed rate compatible QC-LDPC encoder can provide high throughput exceeding 7 Gbps in IEEE 802.11ac systems.

The remainder of this paper is organized as follows. In Section II, QC-LDPC codes employed in IEEE 802.11ac standard are briefly introduced, and a linear encoding process is presented. Section III proposes a high throughput QC-LDPC encoding process. Based on the proposed encoding process, a high throughput QC-LDPC encoder architecture is proposed in Section IV. Section V discusses the implementation results. Finally, Section VI presents the conclusion.

## **II. BACKGROUND**

#### 1. QC-LDPC Codes in IEEE 802.11ac Standard

The parity check matrix of QC-LDPC codes can be described as a base parity check matrix. Fig. 1 shows an example of the base parity check matrix defined in IEEE 802.11ac standards [2]. The digit of the base parity check

| ←  |    |    |    |    |    |    |    |    | H  | 1  |    |    |    |    |    |    |    |    | <b>→</b> | • | H | 2 | → |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----------|---|---|---|---|
| 13 | 48 | 80 | 66 | 4  | 74 | 7  | 30 | 76 | 52 | 37 | 60 | -  | 49 | 73 | 31 | 74 | 73 | 23 | -        | 1 | 0 | - | - |
| 69 | 63 | 74 | 56 | 64 | 77 | 57 | 65 | 6  | 16 | 51 |    | 64 |    | 68 | 9  | 48 | 62 | 54 | 27       |   | 0 | 0 | - |
| 51 | 15 | 0  | 80 | 24 | 25 | 42 | 54 | 44 | 71 | 71 | 9  | 67 | 35 |    | 58 | ÷  | 29 | 1  | 53       | 0 | • | 0 | 0 |
| 16 | 29 | 36 | 41 | 44 | 56 | 59 | 37 | 50 | 24 | •  | 65 | 4  | 65 | 52 |    | 4  |    | 73 | 52       | 1 |   | • | 0 |

Fig. 1. Base parity check matrix of QC-LDPC codes.

Table 1. IEEE 802.11ac QC-LDPC code parameters

| Code rates             | 1/2, 2/3, 3/4, 5/6 |  |  |  |  |
|------------------------|--------------------|--|--|--|--|
| Codeword block lengths | 648, 1296, 1944    |  |  |  |  |
| Sub-matrix size Z      | 27, 54, 81         |  |  |  |  |

matrix indicates the right cyclic shift values of the identity  $Z \times Z$  square sub-matrix. The dash '-' indicates the zero one.

Table 1 shows the QC-LDPC codes parameters of IEEE 802.11ac standards. The QC-LDPC encoder has to support 4 code rates, i.e., 1/2, 2/3, 3/4 and 5/6, and 3 codeword block lengths, i.e., 648, 1296 and 1944. To support 3 codeword block lengths, sub-matrix sizes *Z* are defined as 27, 54 and 81. In IEEE 802.11ac standards, 12 base parity check matrices are defined to support 4 code rates and 3 codeword block lengths.

#### 2. Linear Encoding Process

The base parity check matrix can be partitioned into the two sub-matrices as shown in Fig. 1. Let  $\mathbf{H} = [\mathbf{H}_1 \ \mathbf{H}_2]$ be the partitioned base parity check matrix, where  $\mathbf{H}_1$  is an  $(N-M) \times M$  sub-matrix, and  $\mathbf{H}_2$  is an  $(N-M) \times (N-M)$ matrix. Let  $\mathbf{c} = [\mathbf{m} \ \mathbf{p}]$  be a codeword block, where  $\mathbf{m}$  and  $\mathbf{p}$  indicate the information bit sequence and the parity bit sequence, respectively. From the property that the correct codeword satisfies the parity check equation, the parity bit sequence  $\mathbf{p}$  can be derived as follows [18],

$$\mathbf{H} \cdot \mathbf{c}^{\mathrm{T}} = \mathbf{H}_{1} \cdot \mathbf{m}^{\mathrm{T}} + \mathbf{H}_{2} \cdot \mathbf{p}^{\mathrm{T}} = 0, \qquad (1)$$
$$\mathbf{p}^{\mathrm{T}} = \mathbf{H}_{2}^{-1} \cdot \mathbf{H}_{1} \cdot \mathbf{m}^{\mathrm{T}}. \qquad (2)$$

Since  $\mathbf{H}_1$  is a sparse matrix, and  $\mathbf{H}_2^{-1}$  has a regular pattern, the matrix-vector multiplications of (2) have linear complexity.

# III. PROPOSED HIGH THROUGHPUT QC-LDPC ENCODING PROCESS

An encoding process can be divided into two steps

based on (2). The first step computes  $\mathbf{H}_1 \cdot \mathbf{m}^T$ . The second step is the multiplication of  $\mathbf{H}_2^{-1}$  and the results of the first step. The results of the first step can be expressed as follow,

$$\mathbf{H}_{1} \cdot \mathbf{m}^{\mathrm{T}} = [\boldsymbol{\lambda}_{0}, \boldsymbol{\lambda}_{1}, \dots, \boldsymbol{\lambda}_{N-M-1}]^{\mathrm{T}}, \qquad (3)$$

where  $\lambda_i = \mathbf{H}_1(i) \cdot \mathbf{m}^{\mathrm{T}}$ .  $\mathbf{H}_1(i)$  denotes the *i*th row of  $\mathbf{H}_1$ .  $\lambda_i$  can be decomposed as follow,

$$\boldsymbol{\lambda}_{i} = \sum_{j=0}^{M-1} \mathbf{H}_{1}(i,j) \cdot \mathbf{m}^{\mathrm{T}}(j), \qquad (4)$$

where  $\mathbf{H}_1(i, j)$  represents the  $Z \times Z$  sub-matrix at the *i*th row and the *j*th column of  $\mathbf{H}_1$ ,  $\mathbf{m}(j)$  denotes the *j*th  $Z \times 1$ sub-sequence of the information bits, and M is the number of columns of  $\mathbf{H}_1$ . In order to compute  $\lambda_i$ , (4) is implemented based on either a serial processing or a parallel processing. Since the serial processing requires many clock cycles to compute  $\lambda_i$ , the high throughput encoding cannot be achieved [15]. Although the fully parallel processing can support the high throughput encoding, it causes large hardware complexity because the size of  $\mathbf{m}$  is very large [14].

In this paper, to support the high throughput with low hardware complexity, we propose a column-direction partially parallel encoding process for computing  $\lambda_i$  [16, 17]. Let  $\lambda_i(j)$  be

$$\boldsymbol{\lambda}_{i}(j) = \sum_{l=0}^{j} \mathbf{H}_{1}(i,l) \cdot \mathbf{m}^{\mathrm{T}}(l).$$
 (5)

 $\lambda_i$  of (4) can be represented as follows,

$$\boldsymbol{\lambda}_{i}(j) = \boldsymbol{\lambda}_{i}(j-1) + \mathbf{H}_{1}(i,j) \cdot \mathbf{m}^{\mathrm{T}}(j).$$
(6)

Since there is no dependency between  $\lambda_i$  and  $\lambda_{i-1}$ , all  $\lambda_i$ 's for  $0 \le i \le (N-M-1)$  can be computed simultaneously. In other words, the all sub-matrices located at the same column in  $\mathbf{H}_1$  are multiplied with one  $Z \times 1$  sub-sequence of the information bits at the same time. According to (6), all  $\lambda_i$ 's can be obtained by the accumulation of  $\lambda_i(j)$  in M clock cycles because  $\mathbf{H}_1(i, j) \cdot \mathbf{m}^{\mathrm{T}}(j)$  is performed at one clock cycle.

Since  $H_2^{-1}$  of the base parity check matrices defined in IEEE 802.11ac standard has a regular pattern, a parity bit

sequence **p** can be easily computed from  $\lambda_i$ 's. The **p** can be expressed as **p** = [**p**(0), **p**(1), ..., **p**(*N*-*M*-1)], where **p**(*i*) is the sub-sequence of the parity bits with a size of *Z*. The first sub-sequence **p**(0) can be obtained by a summation of all  $\lambda_i$ 's as follow,

$$\mathbf{p}(0) = \sum_{i=0}^{N-M-1} \boldsymbol{\lambda}_i.$$
(7)

From (7), the second sub-sequence  $\mathbf{p}(1)$  and the last sub-sequence  $\mathbf{p}(N-M-1)$  can be easily obtained as follows,

$$\mathbf{p}(1) = \mathbf{p}_1(0) + \lambda_0,$$
(8)  
 
$$\mathbf{p}(N-M-1) = \mathbf{p}_1(0) + \lambda_{N-M-1},$$
(9)

where  $\mathbf{p}_1(0)$  is  $\mathbf{p}(0)$  with a single right cyclic shift. The rest of the parity sub-sequences can be obtained by a forward accumulation and a backward accumulation [19] as follows,

$$\mathbf{p}(i) = \mathbf{p}(i-1) + \lambda_{i-1}, \text{ for } 2 \le i \le (N-M)/2, \quad (10)$$
  
$$\mathbf{p}(i) = \mathbf{p}(i+1) + \lambda_i, \text{ for } (N-M)/2 < i \le N-M-2. \quad (11)$$

Since there is no dependency between the forward and backward accumulations, both are computed in parallel. In the proposed encoding, the second step can be performed in one clock cycle because all the results of the first stage are completely obtained at the same time. Based on the proposed encoding process, the encoding of IEEE 802.11ac QC-LDPC codes can be completed in M + 1 clock cycles.

# IV. PROPOSED HIGH THROUGHPUT QC-LDPC ENCODER ARCHITECTURE

#### 1. Overall Architecture of Encoder

In IEEE 802.11ac standards, QC-LDPC encoder not only generates codeword from information bits, but also adds pad bits to the information bits, and performs puncturing and repeating the codeword [2]. Fig. 2 shows the overall architecture of the proposed QC-LDPC encoder including *encoder core*, *padding stage* and *puncturing and repeating stage* [17]. Adding the pad bits to the information bits is performed before the QC-LDPC *encoder core*, and puncturing and repeating the



Fig. 2. Overall encoder architecture for IEEE 802.11ac QC-LDPC codes.

codeword are performed after the LDPC *encoder core*. The padded information bits and the punctured or repeated codeword are temporarily saved in the buffers. The size of input bits of the proposed encoder architecture is set to Z. The size of output bits varies depending on the modulation scheme, and the several output steams are output for multiple spatial streams. Control signals for the overall encoder architecture are generated by the controller. The major roles of the control signals are to enable the each step and to give the each step the information of the code rates and block lengths. In the proposed encoder architecture, the sizes of buffers are optimized to minimize the hardware overhead.

#### 2. Proposed High Throughput Encoder Core

As shown in Fig. 2, the QC-LDPC encoder core can be divided into two steps. The first step consists of *N-M* cyclic shifters (CSs) and the accumulation logics including the adders and the flip-flops (FFs). The adder can be replaced with a XOR gate. From the *buffer* of the *padding stage*, the encoder core reads the *Z* bits information sub-sequence  $\mathbf{m}(j)$  per clock cycle. Since the sub-matrix  $\mathbf{H}_1(i, j)$  is the identity matrix with cyclic shift or a zero matrix,  $\mathbf{H}_1(i, j) \cdot \mathbf{m}^{\mathrm{T}}(j)$  can be implemented by a cyclic shifter. Thus,  $\lambda_i(j)$  of (6) is obtained by accumulating the cyclic shift results of  $\mathbf{m}(j)$ . Since there is no dependency between  $\lambda_i$  and  $\lambda_{i-1}$ , all  $\lambda_i$ 's can be

simultaneously obtained by the (*N*-*M*)-ways partially parallel processes as shown in Fig. 2. Thus, *N*-*M* CSs, adders and FFs are required. The final results of the first stage,  $\lambda_i(M-1)$ 's, can be obtained after *M* sub-sequences of the information bits are read. Therefore, the required number of clock cycles at the first stage is *M*.

The second step consists of the vector adders and the forward and backward (FW/BW) accumulators. According to (7), the parity sub-subsequence  $\mathbf{p}(0)$  can be obtained by the vector addition of all  $\lambda_i$ 's. As shown in Fig. 2,  $\mathbf{p}(1)$  and  $\mathbf{p}(N-M-1)$  can be computed by  $\mathbf{p}_1(0) + \lambda_0$ and  $\mathbf{p}_1(0) + \lambda_{N-M-1}$ , respectively. The other sub-sequences of the parity bits can be obtained by the forward and backward accumulators according to (10) and (11), respectively. Fig. 3 shows the proposed forward and backward accumulator architecture. Although there is dependency between the parity sub-sequences p(i)'s, the critical path of the second step is short because all operations are just N-M vector additions. Therefore, all parity bits can be obtained in just one clock cycle. The total number of clock cycles for encoding is just M+1. The maximum value of M is 20 when the code rate is 5/6. The maximum required clock cycles of the proposed encoder are 21. Based on the partially parallel processing at the first step and one clock cycle accumulations at the second step, QC-LDPC encoder can generate codeword with the low complexity in the small number of clock cycles.



Fig. 3. Forward and backward accumulators.

### 3. Proposed Low Complexity Cyclic Shifter

The range of cyclic shift value to be performed is from 0 to Z-1. A barrel shift based conventional CS should be decomposed into the  $\lceil \log_2 Z \rceil$  steps. The required number of bits to save the cyclic shift value defined in the base parity check matrix is also  $\lceil \log_2 Z \rceil$  bits.

Fig. 4 shows the proposed low complexity and rate compatible CS. The barrel shifter based proposed CS consists of the several steps and can support the cyclic shift for the various sizes of Z by using multiplexors. At the kth step of the barrel shifter, the cyclic shift with size of  $2^{(k-1)}$  bits is performed. From the *j*th sub-sequence of information bits  $\mathbf{m}(i)$  with size of Z, one other subsequence of information bits  $\mathbf{m}_{\alpha}(j)$  with  $\alpha$  bits cyclic shift can be obtained. If the cyclic shift value to be performed is larger than  $\alpha$ ,  $\mathbf{m}_{\alpha}(j)$  is used instead of  $\mathbf{m}(j)$  in (6). Otherwise,  $\mathbf{m}(i)$  is used. By using the suitable  $\alpha$ , the number of steps in the proposed CS can be reduced, and the required number of bits to save the cyclic shift value in memory can be also reduced. For IEEE 802.11ac QC-LDPC codes, if  $\alpha$  is set to 22, the maximum cyclic shift values to be performed by the CS can be reduced to 22, 31 and 58 instead of 27, 54 and 80, respectively. By



Fig. 4. Proposed low complexity cyclic shifter.

setting  $\alpha$  to 22, the number of steps of the proposed CS can be reduced from  $\lceil \log_2 80 \rceil$  to  $\lceil \log_2 58 \rceil$ . Thus, the proposed CS requires fewer cyclic shift steps than the conventional CS. The proposed CS is more efficient in the parallel architecture. Moreover, the required number of bits to save the cyclic shift value can be reduced by saving the difference between the original cyclic shift value and  $\alpha$ .

#### V. IMPLEMENTATION RESULTS

Table 2 shows the throughput comparisons between the existing and the proposed QC-LDPC encoders. The existing QC-LDPC encoders [4, 5] support the rate compatibility. However, these encoders cannot achieve Gbps throughput at 100 MHz clock frequency. Although the QC-LDPC encoder in [6] can provide 860 Mbps throughput, it is not rate compatible. The QC-LDPC encoder presented in [12] is capable of providing rate compatibility and high throughput up to 3.34 Gbps at 186 MHz clock frequency. In [14, 15], the required number of clock cycles for encoding are presented instead of throughput. The QC-LDPC encoder based on the fully parallel architecture [14] requires 24 clock cycles to complete encoding. The QC-LDPC encoder based on the serial architecture [15] needs at least 73 clock cycles. The proposed QC-LDPC encoder can perform encoding for 4 code rates and 3 codeword block lengths. The

|                    | [4]        | [5]        | [6]          | [12]                  | [14]                  | [15]                  | This work             |
|--------------------|------------|------------|--------------|-----------------------|-----------------------|-----------------------|-----------------------|
| Rate compatibility | Compatible | Compatible | Incompatible | Compatible            | Compatible            | Compatible            | Compatible            |
| Code rates         | 1/2, 7/8   | 1/2, 7/8   | 7/8          | 1/2, 2/3,<br>3/4, 5/6 | 1/2, 2/3,<br>3/4, 5/6 | 1/2, 2/3,<br>3/4, 5/6 | 1/2, 2/3,<br>3/4, 5/6 |
| Block lengths      | 4096, 8192 | 2304, 8064 | 8158         | 576:96:2304           | 648, 1296, 1944       | 648, 1296, 1944       | 648, 1296, 1944       |
| Frequency          | 100 MHz    | 100 MHz    | 54 MHz       | 186 MHz               | -                     | -                     | 100 MHz               |
| Clock cycles       | -          | -          | -            | -                     | 24                    | 73-83                 | 21                    |
| Gate counts        | 10.7 K     | -          | 31 K         | -                     | -                     | -                     | 96 K                  |
| Max. throughput    | 360 Mbps   | 600 Mbps   | 860 Mbps     | 3.34 Gbps             | -                     | -                     | 7.7 Gbps              |

Table 2. Throughput comparisons of QC-LDPC encoders

Table 3. Complexity comparison of CSs

|                                    | Conventional CS | Proposed CS  |
|------------------------------------|-----------------|--------------|
| Sub-matrix size                    | 27 / 54 / 81    | 27 / 54 / 81 |
| Gate count                         | 1,581           | 1,363        |
| Memory bits per<br>CS value (bits) | 5/6/7           | 5/5/6        |

Table 4. Implementation results of the proposed encoder

| CMOS technology         | 130-nm                |
|-------------------------|-----------------------|
| Clock frequency         | 100 MHz               |
| Rate-compatibility      | Compatible            |
| Throughput              | 7.7 Gbps              |
| Area                    | 2.863 mm <sup>2</sup> |
| Total gate count        | 273 K                 |
| QC-LDPC core gate count | 96 K                  |

proposed encoder can complete encoding with 21 clock cycles, and provide 7.7 Gbps throughput at 100 MHz clock frequency. In order to demonstrate that the proposed encoder provides the high encoding throughput with low-complexity, we compared the total gate counts of the proposed encoder with ones of the existing encoders [4, 6]. Even though the proposed encoder requires more gate counts up to nine-times compared with the existing encoder, the throughput of the proposed encoder is much faster than the existing encoder up to twenty-times as shown in Table 2. Therefore, compared with the existing QC-LDPC encoders, the proposed encoder requires fewer clock cycles and achieves the highest throughput with low complexity. The proposed encoder is also rate compatible.

Table 3 compares the complexity of the conventional CS and the proposed CS. In IEEE 802.11ac QC-LDPC codes, the complexity of the proposed CS is less than that of conventional CS, when  $\alpha$  is 22. In the partially parallel

architecture, the proposed CS is more efficient than the conventional CS. The proposed CS also needs fewer memory bits for saving the cyclic shift value than the conventional CS.

Table 4 shows the implementation results of the proposed QC-LDPC encoder for IEEE 802.11ac standards. The 130-nm CMOS technology is used. The proposed rate compatible QC-LDPC encoder can support up to 7.7 Gbps throughput at 100 MHz clock frequency. The area and total gate count of the overall encoder is 2.863 mm<sup>2</sup> and 273 K, respectively. The gate count of the encoder core is 96 K.

#### **VI.** CONCLUSIONS

In this paper, we proposed the high throughput and rate compatible QC-LDPC encoding process and encoder architecture. By using the column-direction partially parallel processing and the forward and backward accumulation processes carried out at single clock cycle, 7.7 Gbps throughput was achieved with 96 K gate counts. In order to reduce the hardware complexity, the low complexity parallel CS was proposed. The proposed encoder is also rate compatible to support 4 code rates and 3 codeword block lengths. Therefore, the proposed encoder can be well applied for encoding of QC-LDPC codes in IEEE 802.11ac standard.

#### ACKNOWLEDGMENTS

This work was supported by the IT R&D program of MOTIE/KEIT. [10035389, Research on high speed and low power wireless communication SoC for high resolution video information mining]. CAD Tools were supported by IDEC.

# REFERENCES

- R. G. Gallager, "Low-density parity-check codes," *Information Theory, IRE Transaction*, Vol. IT-8, No. 1, pp. 21-28, Jan., 1962.
- [2] IEEE 802.11acTM/D2.0, "Draft Standard for Information Technology Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications, Amendment 4: Enhancements for Very High Throughput for Operation in Bands below 6GHz," Jan., 2012.
- [3] IEEE Std 802.16eTM-2005, "IEEE Standard for Local and metropolitan area networks Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems," Feb., 2006.
- [4] H. Zhong, and T. Zhang, "Block-LDPC: a practical LDPC coding system design approach," *Circuits* and Systems I, IEEE Transactions on, Vol. 52, No. 4, pp. 766-775, Apr., 2005.
- [5] H. Zhang, J. Zhu, H. Shi, and D. Wang, "Layered Approx-Regular LDPC: Code Construction and Encoder/Decoder Design," *Circuits and Systems I*, *IEEE Transactions on*, Vol. 55, No. 2, pp. 572-585, Mar., 2008.
- [6] L. H. Miles, J. W. Gambles, and G. K. Maki, "An 860-Mb/s (8158, 7136) low-density parity-check encoder," *Solid-State Circuits, IEEE Journal of*, Vol. 41, No. 8, pp. 1686-1691, Aug., 2006.
- [7] Z. Ma, Y. Li, and X. Wang, "A Quasi-Parallel Encoder of Quasi-Cyclic LDPC Codes in IEEE 802.16e," *Information Science and Engineering*, 2009, ICISE 2009, 1st International Conference on, 26-28, pp. 2492-2495, Dec., 2009.
- [8] Z. Wen, Y. Y. Huang, and S. K. Xiong, "Implementation of DTTB LDPC encoder based on FPGA," *Information Science and Engineering*, 2010, ICISE 2010, 2nd International Conference on, 4-6, pp. 2126-2128, Dec., 2010.
- [9] T. Adiono, A. Prasetiadi, and A. Salbiyono, "Efficient encoding for hardware implementation of IRA LDPC on 802.16 standard," *Intelligent Signal Processing and Communication Systems*, 2010, ISPACS 2010, International Symposium on, 6-8, pp. 1-4, Dec., 2010.
- [10] A. Mahdi, N. Kanistras, and V. Paliouras, "An encoding scheme and encoder architecture for ratecompatible QC-LDPC codes," *Signal Processing*

*Systems, 2011, SiPS 2011, IEEE Workshop on*, 4-7, pp. 328-333, Oct., 2011.

- [11] Z. Zeng, "A High-Efficiency LDPC Encoder with Optimized Backtracking Algorithm," *Communications and Mobile Computing, 2011, CMC 2011, 3rd International Conference on,* 18-20, pp. 341-344, Apr., 2011.
- [12] J. Kim, H. Yoo, and M. Lee, "Efficient encoding architecture for IEEE 802.16e LDPC codes," *Fundamentals, IEICE Transactions on*, Vol. E91-A, No. 12, pp. 3607-3611, Dec., 2008.
- [13] P. Zhang, C. Liu, and L. Jiang, "Efficient encoding of QC-LDPC codes based on rotate-leftaccumulator circuits," *Electronics Letters*, Vol. 49, No. 13, pp. 810-812, Jun., 2013.
- [14] Z. Cai, J. Hao, P. H. Tan, S. Sun, and P. S. Chin, "Efficient encoding of IEEE 802.11n LDPC," *Electronics Letters*, Vol. 42, No. 25, pp. 1471-1472, Dec., 2006.
- J. M. Perez, and V. Fernandez, "Low-cost encoding of IEEE 802.11n," *Electronics Letters*, Vol. 44, No. 4, pp. 307-308, Feb., 2008.
- [16] Y. Jung, Y. Jung, and J. Kim, "Memory-efficient and high-speed LDPC encoder," *Electronics Letters*, Vol. 46, No. 14, pp. 1035-1036, Jul., 2010.
- [17] Y. Jung, C. Chung, Y. Jung, and J. Kim, "7.7 Gbps encoder design for IEEE 802.11n/ac QC-LDPC codes," *International SoC Design Conference*, 2012, ISOCC 2012, 4-7, pp. 215-218, Nov., 2012.
- [18] M. Yang, W. E. Ryan, and Y. Li, "Design of efficient encodable moderate-length high-rate irregular LDPC codes," *Communications, IEEE Transactions on*, Vol. 52, No. 4, pp. 564-571, Apr., 2004.
- [19] C. Y. Lin, C. C. Wei, and M. K. Ku, "Efficient encoding for dual-diagonal structured LDPC codes based on parity bit prediction and correction," *Circuits and Systems, 2008, APCCAS 2008, IEEE Asia Pacific Conference on*, 30, pp. 1648-1651, Nov., 2008.



**Yong Min Jung** received the B.S. (summa cum laude), M.S. and Ph.D. degrees in electrical and electronic engineering from the Yonsei University, Seoul, Korea, in 2007, 2009, and 2014, respectively. He is currently a senior engineer in Mobile

Communication Division, Samsung Electronics Co. Ltd., Suwon, Korea. He received the best paper award in 2012 International SoC Design Conference. His research interests include the error correction encoding/decoding algorithms and SoC/VLSI implementation, and wireless communication system algorithm and SoC/VLSI implementation, and mobile and video communication algorithm and SoC/VLSI implementation.



**Chul Ho Chung** received the B.S. and M.S degrees in electrical and electronic engineering from Yonsei University, Seoul, Korea, in 2003 and 2009, and is currently pursuing his Ph.D. degree. His research interests include the algorithm and

SoC implementation of MAC layer for the wireless multimedia communication system such as WPAN and WLAN, and resource management for mobile networks.



**Yun Ho Jung** received the B.S., M.S., and Ph.D. degrees in department of electrical and electronic engineering from Yonsei University, Seoul, Korea, in 1998, 2000, and 2005, respectively. From 2005 to 2007, he was a senior engineer in the Wireless

Device Solution Team, Communication Research Center, Telecommunication Network Division, Samsung Electronics Co. Ltd., Suwon, Korea. From 2007 to 2008, he was a research professor at Institute of TMS Information Technology, Yonsei University, Seoul, Korea. He is currently an associative professor in the School of Electronics, Telecommunication, and Computer Engineering, Korea Aerospace University, Goyang, Korea. His research interests include the signal processing algorithm and SoC/VLSI implementation for the wireless communication systems and image processing systems.



Jae Seok Kim received a B.S. degree in electronic engineering from Yonsei University, Seoul, Korea in 1977, M.S. degree in electrical and electronic engineering from KAIST, Daejon, Korea in 1979, and Ph.D. degree in electronic engineering from

RPI, NY, USA in 1988. From 1988 to 1993, he was a member of the technical staff at AT&T Bell Labs, USA. He was Director of the VLSI Architecture Design Lab of ETRI from 1993 to 1996. He was Director of the IT SoC research center from 2001 to 2009. He is currently a professor in the electrical and electronic engineering department at Yonsei University, Seoul, Korea and a Director of System IC 2015, national project, Korea. His current research interests include communication VLSI design, high performance digital signal Processor VLSI design, multimedia VLSI design and CAD S/W.