### Compression-Friendly Low Power Test Application Based on Scan Slices Reusing

Wang Weizheng<sup>1,2,\*</sup>, Wang JinCheng<sup>1</sup>, Cai Shuo<sup>2</sup>, Su Wei<sup>3</sup>, and Xiang Lingyun<sup>2</sup>

Abstract—This paper presents a compression-friendly low power test scheme in EDT environment. The proposed approach exploits scan slices reusing to reduce the switching activity during shifting for test scheme based on linear decompressor. To avoid the impact on encoding efficiency from resulting control data, a counter is utilized to generate control signals. Experimental results obtained for some larger ISCAS'89 and ITC'99 benchmark circuits illustrate that the proposed test application scheme can improve significantly the encoding efficiency of linear decompressor.

*Index Terms*—Test compression, low power, linear decompression, embedded deterministic test

### **I. INTRODUCTION**

Today's very large scale integrated (VLSI) circuits are immensely complex in many ways. VLSI testing is becoming increasingly difficult. Power consumption during structural scan-based testing are much higher than that during normal operation since it attempts to activate as many nodes as possible by the use of the fewest test patterns. This is manifested by higher junction temperature and increased peak power which can result in overheating or IR drop. This can ultimately cause a device malfunction and thus yield loss, chip reliability degradation, shorter product lifetime, or device permanent damage [1].

In recent years, numerous methods aimed at reducing test power have been proposed. These Solutions for deterministic test or BIST applications include partitioning and modifications of scan chains [2-4], test pattern reordering [5], scan transition blocking [6], gating of scan flip-flops [8], low power test pattern generators [9], test scheduling [10], as well as power-aware ATPG or post-ATPG X-bits filling [12, 13] keeping the power dissipation below a safe threshold.

Test data compression techniques which can reduce test cost effectively have been also researched extensively. These techniques include nonlinear codebased [14], broadcast-based [15], scan forest architecture-based [16, 17] and linear decompressionbased [18] schemes. The linear decompression-based scheme is widely used in industry designs because of its high compression ratio.

Recently, compression compatible X-filling schemes with the aim of reducing shift or capture power in EDT environment were proposed in [19, 20]. The techniques in [21-23] utilize the available encoding capacity to decrease switching activity during scan loading while ensuring the high quality of the test.

The proposed approach in this paper aims to reduce the switching activity during shifting for compression scheme based on linear decompressor and simultaneously improve its encoding efficiency. Scan slices reusing can effectively reduce the number of transitions during scan operation. The needed control data is generated by a counter which reduces the

Manuscript received Sep. 28, 2015; accepted Mar. 14, 2016

<sup>&</sup>lt;sup>1</sup> Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, China

<sup>&</sup>lt;sup>2</sup>College of computer and communication Engineering, Changsha University of Science and Technology, Changsha, China

<sup>&</sup>lt;sup>3</sup> Modern Education Technology Center, Nanjing Audit University, Jiangsu, China

E-mail : greaquer w@yeah.net



Fig. 1. An example of scan slices overlapping.

encoding requirements of control signals. As a result, the scheme provides high test compression ratio when keeping low switching activity.

### **II. BACKGROUND AND MOTIVATION**

In test cubes generated by ATPG with dynamic compaction, the ratio of specified bits is only between 5% and 1% at the beginning of the ATPG process. After generating small amounts of test patterns the fill rate may fell below 1% [18]. In conventionally BIST or deterministic test environment, the X bits (also called as don't care bits) are filled with random values. This may lead to excessive switching activity during scan shift or capture and then result in yield loss or performance degradation, etc.

Test application scheme based on scan slice overlapping [7] significantly reduces the switching activity. Scan slice is defined as the set of values shifted into all the scan chains in a scan cycle. Scan slice overlapping is defined as the set of values shifted into all the scan chains in a scan cycle. Two consecutive scan slices are called to be overlapping if each pair of corresponding bits in them is compatible (that is, 1 and 1, 0 and 0, or an X bit at least). See the example in Fig. 1(a). The test cube consists of 16 scan slices. Each slice contains 6 bits. The first and second slice are {X0X01X} and {XXXXXX}. All pairs of corresponding bits in them are compatible. If the second, fourth and fifth bits in the second slice are assigned to 0, 0 and 1 respectively, they will be {X0X01X} and overlapping. Similarly, the third slice also is overlapping with the first and second slice.

The slices from the fourth to eighth slices can be overlapping while the slices from the ninth to sixteenth slices can also be overlapping. Consecutive overlapping scan slices are defined as an overlapping block. Plenty of X-bits can make consecutive scan slices in a test pattern have a high overlapping probability. For consecutive overlapping scan slices only the first slice needs to be generated by linear decompressor, while other slices can be obtained by reusing. Consequently, a bit of control signal is necessary for each slice to indicate whether this slice is reused. Scan slice overlapping may reduce the number of specified bits encoded by linear decompressor. However, the increased control data usually have greater (negative) impact on encoding efficiency of linear decompressor. The paper [11] expands the work in [7] and improves test data compression by using scan cell reordering. But subsequent routing brings detrimental impact on area and timing. From a new perspective, this paper aims to improve the compression ratio at the negligible cost and meanwhile keep the advantage of low power test application scheme.

The proposed approach has some similarity to the low power scheme based on fixed-length block encoding in [24]. Both approaches use the compatibility of test bits in consecutive scan cells for improving test compression and reducing test power. However, they are significantly different. The proposed scheme reduces shift-in power based on compatible scan slice and introduces a new technique for improving test compression. The solution in [24] is based on scan block compatibility in each scan chain for shift and capture power reduction. A scan block is defined as fixed-length consecutive scan cells in the same scan chain.

### III. COMPRESSION-FRIENDLY LOW POWER TEST APPLICATION SCHEME

#### 1. Proposed Test Architecture

The scan-based hardware implementing for the proposed scheme is shown in Fig. 2. Compressed test stimulus and compacted test response are stored in ATE. Compressed test patterns and control information are delivered via input channels to the decompressor. The decompressor consists of ring generator and phase shifter. The decompressor feeds the scan chains and the counter



Fig. 2. The test structure of the proposed scheme.

simultaneously. Compared to the STUMPS structure, one 2-to-1 MUX is placed at the input of each scan chain. The counter is utilized to generate the 0 or 1 control data which control the MUX to select data from the decompressor or previous slice. When the first slice of an overlapping block is loaded, control signal should be set to 1 by counter so that test data from the decompressor is shifted into the scan chains. Next, control signal should be set to 0 so that the last value loaded into the scan chains is repeatedly loaded and scan slice reusing is implemented (the data from the decompressor is ignored).

Suppose d is the number of overlapping slices that the current overlapping block contains. When the first slice of an overlapping block is loaded, the counter is also replaced by d-1. Next, the counter state, which represents a binary number, decreases by 1 at a clock pulse until states of all the flip-flops in the counter are 0. The counter entering all 0's state implies that next overlapping block needs to be loaded. The counter state is all 0's, the counter outputs a 1. Otherwise, it outputs a 0.

## 2. The Test Compression Flow of the Proposed Test Scheme

The number of flip-flops in counter determines the

maximum size of an overlapping block. If the counter contains k flip-flops, in the scheme the size of an overlapping block N satisfies:

$$1 \le N \le 2^k \tag{1}$$

Let  $T_i$  (*i*=1, 2, ...) denote destination overlapping blocks. Initialize  $T_i = \Phi$  (*i*=1, 2, ...). Given a test pattern *V* and the maximum value of counter *N*, the overlapping block partition flow proceeds then as follows:

1) Determine all the scan slices corresponding to the *V* according to scan chains structure.

2) Initialize i=1. Add the first slice to the *i*th overlapping block  $T_i$ .

3) Select a new slice *s* in order of scan-loading. Judge whether *s* is compatible with  $T_{i}$ . If compatible go to step (4), else go to step (5).

4) If  $N_i$  (size of  $T_i$ ) <N, add s to  $T_i$  and update the slices in  $T_i$ . Otherwise, go to step (5).

5) i=i+1. Add s to  $T_i$ .

6) Repeat step (3) until all the slices are assigned in a overlapping block.

7) Output obtained overlapping block  $T_i(i=1, 2, \dots, m)$ and their size  $N_i(i=1, 2, \dots, m)$ .

When updating the slices in  $T_i$  we use specified bits combination to take the place of these slices. For example,  $T_i$  contains 3 slices {1XX1X0}, {XXXXX} and {X00100}. When updating the slices in  $T_i$ , specified bits combination {100100} will be used to take the place of the 3 slices.

When compressing a test pattern, firstly one should divide the pattern into overlapping blocks with the above partition algorithm. The next compression flow of the proposed test scheme proceeds as follows. Initially, the counter is in all 0's state. The counter outputs a 1. In this case, control signals of all MUXs are 1's and the first slice should be encoded via decompressor. Meanwhile, the counter enter pre-loading mode and the size of current overlapping block minus 1, i.e.  $N_1$ -1 (binary number) should be also encoded by decompressor. Next, the counter enters subtract count mode, state binary number subtracted 1 in each clock cycle. Before reaching all 0's state the counter outputs a 0, and the slices are generated by reusing the first slice and need not to be encoded by decompressor. Now data from decompressor is ignored by both scan chains and counter. When the



Fig. 3. An example of the proposed scan slices reusing.

counter reaches all 0's state, the second overlapping block is encoded just as the first one. The process is repeated until all the overlapping blocks are encoded.

# **3.** Determination of the Number of Flip-Flops in the Counter

The number of flip-flops in the counter depends on the size of the bulk of overlapping blocks. If specified ratio in test set is very high, the overlapping possibility of scan slices is little and in most cases potential overlapping blocks is in small size. The counter with few flip-flops should be selected.

Take the test cube in Fig. 1(a) for example. Not considering the restraint of counter, there are 3 overlapping blocks. If the counter with 3 flip-flops is used, encoding the control signal of each block needs 3bit data, i.e. 010, 100, and 111 respectively. There are 9 bits which need to be generated by decompressor in total. Compared with storing directly, 7 bits is reduced. If the counter with 4 flip-flops is used, encoding the control signal of each block needs 4-bit data, i.e. 0010, 0100, and 0111 respectively. 12-bit control signal needs to be generated by decompressor in total. The specified bits in scan sices needing to be generated by decompressor are shown in Fig. 1(b). However, if the counter with 2 flipflops is adopted, blocks larger than 4 must be partitioned further as shown in Fig. 3. There are 10 bits which need to be generated by decompressor in total. Furthermore, the specified bits needing to be generated by decompressor will increase as shown in Fig. 3(b). In the example, it is most reasonable to select a counter with 3

Table 1. Circuit and test cubes profile for benchmark circuits

| Circuit | Gate  | Inputs | FF   | #Test | Specified<br>Ratio (%) |
|---------|-------|--------|------|-------|------------------------|
| S13207  | 7951  | 700    | 638  | 336   | 4.47                   |
| S35932  | 16065 | 1763   | 1728 | 73    | 5.48                   |
| S38417  | 22179 | 1664   | 1636 | 1128  | 3.90                   |
| S38584  | 19253 | 1464   | 1426 | 865   | 4.16                   |
| B17     | 30777 | 1452   | 1415 | 1788  | 4.26                   |
| B22     | 29162 | 767    | 735  | 2199  | 3.59                   |
| Ave.    |       |        |      |       | 4.31                   |

flip-flops.

#### **IV. EXPERIMENTAL RESULTS**

To verify the efficiency of the proposed test scheme, experiments are performed on several big ISCAS'89 and ITC'99 benchmark circuits. The targeted fault model is stuck-at fault.

The profiles for circuits and test set are summarized in Table 1. Columns 2-6 list the number of (total) gate, inputs count, the number of scan cells, the number of test patterns, as well as the ratio of X-bits in the test cubes, respectively.

Table 2 shows the simulation results on test compression and comparison for test scheme introduced in [7] and the proposed compression-friendly low power test scheme. It deserves noting that the results listed in column 2 are obtained by using our test set and test scheme introduced in [7]. Column 2 presents the configuration of scan chain. Column 3 gives the number of specified bits which need to be encoded by decompressor in test scheme [7]. Columns 4, 6 and 8 list the number of specified bits which need to be encoded by decompressor in proposed compression-friendly low power test scheme with 2-bit, 3-bit and 4-bit counter, respectively. Columns 5, 7 and 9 also designate the specified bits reduction of the proposed schemes compared to [7].

As shown in the Table 2, Comparing with the results of compression-aware low power test scheme in [7], specified bits which need to be encoded by decompressor for the proposed scheme with 2-bit, 3-bit and 4-bit counter are cut down by 15.79, 20.18 and 15.68 percentage points on average, respectively. The proposed test approach reduces significantly the encoding requirement of test patterns. Subsequently it can improve

| Circuit | Scan          | Scan slice<br>overlapping[7] | Proposed scheme with <i>k</i> =2 |       | Propose<br>with | d scheme<br>h <i>k</i> =3 | Proposed scheme<br>with <i>k</i> =4 |       |
|---------|---------------|------------------------------|----------------------------------|-------|-----------------|---------------------------|-------------------------------------|-------|
|         | arenneeture   | Specified bits               | Specified bits                   | Red.% | Specified bits  | Red.%                     | Specified bits                      | Red.% |
| S13207  | 30×22         | 15465                        | 12770                            | 17.43 | 11436           | 26.05                     | 11055                               | 28.52 |
| S35932  | <b>30</b> ×58 | 7644                         | 6285                             | 17.78 | 6363            | 16.76                     | 6851                                | 10.37 |
| S38417  | 30×55         | 108506                       | 90798                            | 16.32 | 91254           | 15.90                     | 97829                               | 9.84  |
| S38584  | <b>30</b> ×48 | 88837                        | 73255                            | 17.54 | 67525           | 23.99                     | 69454                               | 21.82 |
| B17     | <b>30</b> ×48 | 172751                       | 149943                           | 13.20 | 139427          | 19.29                     | 152176                              | 11.91 |
| B22     | 30×25         | 90531                        | 79233                            | 12.48 | 73240           | 19.10                     | 80011                               | 11.62 |
| Average |               |                              |                                  | 15.79 |                 | 20.18                     |                                     | 15.68 |

Table 2. Simulation results on test data

Table 3. Simulation results on test power

| Circuit | Original power |       | Proposed scheme<br>with <i>k</i> =2 |               | Proposed scheme with <i>k</i> =3 |               | Proposed scheme<br>with <i>k</i> =4 |               | Scan slice<br>overlapping[7] | Low-power<br>compression[24] |               |
|---------|----------------|-------|-------------------------------------|---------------|----------------------------------|---------------|-------------------------------------|---------------|------------------------------|------------------------------|---------------|
|         | Average        | Peak  | Ave.<br>Red.%                       | Peak<br>Red.% | Ave.<br>Red.%                    | Peak<br>Red.% | Ave.<br>Red.%                       | Peak<br>Red.% | Ave.<br>Red.%                | Ave.<br>Red.%                | Peak<br>Red.% |
| S13207  | 4126.22        | 4648  | 75.04                               | 53.38         | 87.14                            | 63.25         | 89.64                               | 70.51         | 90.42                        | 50.62                        | 34.84         |
| S35932  | 25473.21       | 27406 | 53.27                               | 44.91         | 71.65                            | 40.43         | 73.95                               | 44.34         | 53.23                        | N/A                          | N/A           |
| S38417  | 23016.44       | 25038 | 68.05                               | 50.34         | 75.49                            | 52.17         | 78.52                               | 54.39         | 70.13                        | 70.86                        | 44.68         |
| S38584  | 17650.10       | 19199 | 74.51                               | 67.16         | 84.73                            | 73.89         | 88.47                               | 72.49         | 79.92                        | 48.08                        | 33.68         |
| B17     | 17482.83       | 19211 | 73.22                               | 63.52         | 85.98                            | 71.19         | 87.83                               | 70.93         | N/A                          | 85.93                        | 69.79         |
| B22     | 4761.69        | 5476  | 66.95                               | 48.19         | 79.62                            | 55.50         | 83.55                               | 62.10         | N/A                          | 84.83                        | 62.86         |
| Average |                |       | 68.50                               | 54.58         | 80.77                            | 59.41         | 83.66                               | 62.46         | 73.43                        | 68.06                        | 49.17         |

encoding efficiency for test compression scheme based on linear decompressor significantly. At the same time, the proposed approach follows the scan slice reusing technique and thus keeps the advantage of low test power.

Table 3 presents the experiment results for test power reduction. The second and third columns list the original shift power of the random X-fill scheme, including average and peak shift power. The weighted transition metric (WTM) is utilized to evaluate shift power in the experiments. The following six columns 4, 6 and 8 give the shift power reduction of proposed compressionfriendly low power test scheme with 2-bit, 3-bit and 4-bit counter compared to the random X-fill scheme, respectively. As can be seen, the power reduction percentage increases with increasing size of counter. To illustrate the power reduction capability of the proposed scheme the last parts provide a comparison to two shift power reduction schemes [7, 24]. It is noteworthy that that the scheme in [7] gives only the average power. The proposed technique can achieve approximate power reduction with the scheme in [7]. When compared with the scheme in [24] the proposed technique with k=2 and k=3 yields greater power reduction for all benchmark circuits except for B22.

The additional hardware overhead consists of a counter with k T flip-flops and a 2-to-1 MUX per scan chain. Besides, this method requires one tester channel for the first multiplexer in the counter. The whole area overhead can be negligible in relation to the circuit size. The overhead won't introduce any negative impact on circuit design.

### V. CONCLUSIONS

The aggressive shrinking of characteristic size and rapid increase of the integrated degree in the electronics industry will bring forward higher requirement VLSI test procedures. By using the counter to generate control signals, the proposed technique provides greater flexibility in test compression based on linear decompressor. Meanwhile, it follows the scan slice reusing technique and thus keeps the advantage of low test power. It does not have detrimental effects on an additional area and potential performance overhead. Experimental results on several larger benchmark circuits demonstrate the efficiency of the proposed technique. Meanwhile, the proposed approach can be combined with other approaches, such as scan cell reordering, to reach higher efficiency.

### ACKNOWLEDGE

This research was supported by the National Natural Science Foundation of China (NSFC) (Grant No. 61303042 and 61202439) and by the Scientific Research Fund of Hunan Provincial Education Department (Grant No. 14C0028).

### REFERENCES

- Milewski S, Mrugalski G, Rajski J, et al. "Low Power Test Compression with Programmable Broadcast-Based Control," *Proc. of IEEE* Asian Test Symposium, 2014:174-179.
- [2] S. Almukhaizim and O. Sinanoglu, "Peak power reduction through dynamic partitioning of scan chains," *Proc. of IEEE ITC*, paper 9.2, 2008.
- [3] S. M. Saeed and O. Sinanoglu. "Design for Testability Support for Launch and Capture Power Reduction in Launch-Off-Shift and Launch-Off-Capture Testing," *IEEE Transactions on Very Large Scale Integration Systems*, 2014, 22(3): 516-521.
- [4] Q. Xu, D. Hu, and D. Xiang, "Pattern-directed circuit virtual partitioning for test power reduction," *Proc. of IEEE ITC*, paper 25.2, 2007.
- [5] V. P. Dabholkar, S. Chakravarty, I. Pomeranz, and S.M. Reddy, "Techniques for minimizing power dissipation in scan and combinational circuits during test application," IEEE Trans. CAD, vol. 17, pp. 1325-1333, Dec. 1998.
- [6] X. Lin, J. Rajski, "Test Power Reduction by Blocking Scan Cell Outputs," *IEEE Proc. ATS*, pp.329-336, 2008.
- [7] Li J, Han Y, Li X. "Deterministic And Low Power BIST Based On Scan Slice Overlapping," *IEEE International Symposium on Circuits and Systems*, Vol. 6, pp. 5670-5673, 2005.
- [8] M. Elm, H. J. Wunderlich, M. E. Imhof, et al. "Scan chain clustering for test power reduction," *Proc of the Design Automation Conference*, pp. 828-833, 2008.
- [9] Abdallatif S. Abu-Issa, Steven F. Quigley. "Bit-Swapping LFSR and Scan-Chain Ordering: A

Novel Technique for Peak- and Average-Power Reduction in Scan-Based BIST," *IEEE Trans on Computer-Aided Design of Integrated Circuits and Systems*, 2009, 28 (5):755–759.

- [10] U. Ingelsson, S. Goel, E. Larsson, E. Marinissen. "Abort-on-Fail Test Scheduling for Modular SOCs without and with Preemption," *IEEE Transactions* on Computers, in press, 2015, DOI: 10.1109/TC. 2015. 2409840.
- [11] Zhou B, Xiao L.Y., Ye Y. Z., et al. "Optimization of test power and data volume in BIST scheme based on scan slice overlapping," *Journal of Electronic Testing: Theory and Applications*, vol. 27, no. 1, pp. 43-56, 2011.
- [12] A. Dutta, S. Kundu and S. Chattopadhyay, "Thermal Aware Don't Care Filling to Reduce Peak Temperature and Thermal Variance during Testing", Proc. of Asian Test Symposium, pp. 25-30, 2013.
- [13] Satya A. Trinadh, Sobhan Bahu Ch., Shiv Govind Singh, et al. "DP-fill: A Dynamic Programming approach to X-filling for minimizing peak test power in scan tests," *Proc. of Design, Automation* & *Test in Europe Conference & Exhibition*, pp. 836-841, 2015.
- [14] U. S. Mehta, K. S. Dasgupta, N. M. Devashrayee, "Modified Selective Huffman Coding for Optimization of Test Data Compression, Test Application Time and Area Overhead," *Journal of Electronic Testing*, vol.26, no. 6, pp. 679-688, 2010.
- [15] Wang S, Wei W. "Cost Efficient Methods to Improve Performance of Broadcast Scan," *IEEE Proc. ATS*, 2008: 163-169.
- [16] D. Xiang, D. Hu, Q. Xu, and A. Orailoglu, "Lowpower scan testing for test data compression using a routing-driven scan architecture," *IEEE Trans. on Computer-Aided Design*, vol. 28, no. 7, pp. 1101-1105, July 2009.
- [17] D. Xiang, J. Li, K. Chakrabarty, and X. Lin, "Test Compaction for Small Delay Defects Using an Effective Path Selection Scheme," ACM Trans. on Design Automation of Electronic Systems, vol. 18, no. 3, July 2013.
- [18] J. Rajski, J. Tyszer, M. Kassab, N. Mukherjee, "Embedded deterministic test," *IEEE Trans. Comput. Aided Des. Integrated Circuits Syst.*, vol. 23, no. 5, pp. 776–792, 2004

- [19] M.-F. Wu, J.-L. Huang, X. Wen, and K. Miyase, "Reducing power supply noise in linear decompressor based test data compression environment for at-speed scan testing," *Proc. ITC*, paper 13.1, 2008.
- [20] X. Liu and Q. Xu, "On simultaneous shift- and capture-power reduction in linear decompressorbased test compression environment," *Proc. ITC*, paper 9.3, 2009.
- [21] Czysz D, Mrugalski G, Mukherjee N, et al. "Deterministic Clustering of Incompatible Test Cubes for Higher Power-Aware EDT Compression," *IEEE Trans on Computer-Aided Design of Integrated Circuits and Systems*, 2011, 30(8): 1225-1238.
- [22] Czysz D, Rajski J, Tyszer J. "Low power test application with selective compaction in VLSI designs," *Proc. ITC*, Paper PTF.2, 2012.
- [23] A. Kumar, M. Kassab, E. Moghaddam, et al. "Isometric Test Data Compression," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 34, no. 11, pp. 1847-1859, 2015.
- [24] W. Z. Wang, J. S. Kuang, Z. Q. You. "Achieving low capture and shift power in linear decompressor-based test compression environment", *Microelectronics Journal*, Vol. 43, No. 1, pp: 143-140, 2012.



Weizheng Wang was received the BS degree in applied mathematics from Hunan University in 2005 and the PhD degree in technology of computer application from Hunan University in 2011, respectively. Presently, he is a praelector at

Department of Computer & Communication Engineering, Changsha University of Science and Technology. His research interests include built-in self-test, design for testability, low-power testing, and test generation.

**JinCheng Wang** is a graduate student at Department of Computer & Communication Engineering, Changsha University of Science and Technology. His research interests include design for testability, image processing.

**Cai Shuo** is a lector and researcher of Changsha University of science and technology, China.

**Su Wei** is a technician and researcher of Modern Education Technology Center, Nanjing Audit University, China

**Lingyun Xiang** is a lector and researcher of Changsha University of science and technology, China.