# Performance Evaluation Involving Multiple Parameters in Built-In-Test Systems Hee-Jung Kang\* Wang-Jin Yoo\*\* #### Abstract The Built-In-Test(BIT) system is an integrated subsystem for the determination of the health status of any primary system. The BIT consists of hardware and software installations directed at performance of the functions of fault detection, diagnosis and isolation, as well as primary system record failure information. Evaluation of the definitions appropriate to the BIT system, including system characteristics and parameters, is important to an understanding of system functions. The object of this paper is to present general definitions of the BIT diagnosis parameters and a semiquantitative evaluation method for BIT systems. Finally, two case studies for actual problem solutions are included. # 1. Introduction The first generations of electrical or mechanical products contained no inherent capabilities for self—analysis. Their simplicity was such that even in the absence of the means of self—testing it was frequently easy to tell whether or not these products were functioning correctly. These technologies required only visual inspections and analytic probes were limited to system troubleshooting in the hands of trained maintenance personnel. However, as military and commercial systems have grown in complexity, the symptoms of system failures have become less noticeable to operators. In addition, as the procedures for the inspection, assessment, repair and replacement of components were increasingly complicated in proportion to system complexity, the requirements for maintenance personnel and testing equipment skills also increased. Thus, system maintenance was no longer a simple task and the availability of trained <sup>\*</sup> Dept. of Industrial Engineering, Kon-Kuk University <sup>\*\*</sup> Dept. of Industrial Engineering, Han Nam University personnel became increasingly mission—critical as maintenance grew in expense. The obvious solution to this conundrum was to design in—system circuits or devices for self—testing primary system, and thus the Built—In—Test(BIT) system was initiated. This approach is an integral part of the design of modern systems. The term "Built-In-Test" refers to subsystems which are used to test the health status of their primary systems. In brief, BIT consists of the hardware and software integrated within a system to perform the functions of fault detectionn diagnosis and isolation, as well as record failure information, failure management intelligence, and suggestions for possible reconfigurations. In this sense, "hardware" includes detection circuits composed of electrical(or mechanical) sensors and processing units, including access to system CPU, ROM, RAM, Input/output units, or other needed system units, or separate and parallel units. In addition, A/D converters, comparators, or display subsystems may be added, dependent upon the requirements of the particular application. As used above, "software" is used to designate micro-programs for the detection, diagnosis and isolation of system faults. These programs are installed as microprocessor RAM, and can also be used to control system operations. In genereal, BIT software development is dependent upon Failure Mode and Effects Analysis (FMEA) and Fault Tree Analysis(FTA), and is of course based upon objectives described by specific users. The planning basis for BIT systems is rooted in through analysis of all potential failure modes of a given system. # 2. Assessment of BIT Diagnosis Parameters The definition and specification of BIT diagnosis parameters are important top level tasks in the conceptual definition of BIT system design, improper specification and assessment of the BIT diagnosis parameters can have important as well as negative influences upon the hardware/software design process, BIT performance, and the time and costs incurred in the development of the BIT. The first step toward effective or improved BIT performance involves the presentation of consistent definitions, specifications, and assessments of BIT diagnosis parameters. # 2.1 General Definitions of BIT Diagnosis Parameters A number of investigations of BIT diagnosis parameters have been undertaken, nearly each of which is based upon its own terminology for and definitions of BIT diagnostic performance. This is not unusual situation for new and developing application driven fields, and may be attributed to the number of different industrial as well as military organizations interested in BIT systems. Accepted standards for notation and terminology remain to be developed. For engineering applications, a standard for accurate and suitable BIT diagnosis parameter definitions and notation has been proposed and is shown in Table 1 [13]. Two terms are used to describe BIT false alarm performance. One is false alarm probability $F_{\alpha}$ , and the other is the false alarm rate. One point of confusion in many papers is that the false alarm probability $F_{\alpha}$ (as defined in Table 1) is also called the "false alarm rate." What is the difference between these two parameters? Shao and Lamberson [13] have provided great insight into and clarification of this point. A false alarm occurs when the BIT indicates a fault but no actual malfunction exists in the primary system. The false alarm probability is useful for maintainability engineering and is convenient for BIT system analysis. The false alarm probability, which is an average ratio, is expressed as follows [13]: Table 1. BIT Diagnosis Parameter Notations and Definitions | NOTATION | TEDM | DDDINIDION | |----------------------------|------------------|-------------------------------------------------------------------| | NOTATION | TERM | DEFINITION | | $F_{\scriptscriptstyle d}$ | Fault Detection | The probability that BIT will detect an existing | | | Probability | functional failure in the system. | | $F_i$ | Fault Isolation | The probability that BIT will isolate a failure | | | Probability | that has been detected by BIT down to the | | | | specified level(usually a single LRU). | | $F_a$ | False Alarm | The probability the BIT will indicate a failure | | | Probability | when there is no actual system malfunction. | | $\lambda_{FA}$ | False Alarm Rate | The rate at which the BIT issues false alarms in | | | | a certain time interval for the BIT surviving at | | | | the start of the interval. | | $\lambda_B$ | BIT Essential | The rate at which BIT physical failures occur in | | | Failure Rate | a certain time interval for the BIT surviving at | | | | the start of the interval. | | $E_b$ | BIT | $E_{s}$ Tota! Effectiveness Maintenance Activities (Based on BIT) | | | Effectiveness | Total Maintenance Activities (Based on BIT) | | $T_{\scriptscriptstyle d}$ | The Time for | Same as the term. | | | BIT Detection | | | | and Isolation | | $$F_a = \frac{Pr(False\ Alarm\ and\ No\ System\ Failure)}{Pr(Detect\ True\ Failures) + Pr(false\ Alarm\ and\ No\ System\ Failure)} \tag{1}$$ The false alarm rate $\lambda_{FA}$ is the rate at which the BIT issues false alarms in a certain time interval for the BIT surviving at the start of the interval. Shao and Lamberson found that $F_a$ and $\lambda_{FA}$ are related but not the same. They proved that if $0.9 \le F_d \le 1.0$ , $\lambda_i \le 0.01/hr$ and $t \le 10hr$ , then $$F_a \approx \frac{\lambda_{FA}}{\lambda_i}$$ (2) where $\lambda_i$ is the failure rate of the line replaceable unit(LRU). It is thus clear that the conditions for Eq. (2) are easy to meet for a modern electronic system. Subject to selected modifications and extensions, this paper proposes the adoption of the conventions indicated in Table 1. The result will be increased precision of definition and enhanced prediction methods for fault detection as well as fault isolation probabilities. #### 2.2 Assessment of BIT Diagnosis Parameters A number of benefits can be obtained from the employment of the BIT. Howerver, a BIT is just another subsystem which increases the complexity of the total system, and BIT diagnosis mistakes will cause serious problems. In references [1,2,4-7,9,11-13, 14] two kinds of assessments for BIT diagnosis parameters are presented. One is qualitative analysis method and the other is quantitative. Lamberson and Shao [8] listed in detail the benefits and disadvantages of a BIT for a system's testability, maintainability, reliability, and safety, and for system performance and cost effectiveness. Article [14] described the qualitative effects of Self—Test on a system. However, these results are purely qualitative. This paper will provide a more scientific, semi—quantitative method for BIT diagnosis parameter assessment. For purely quantitative assessment, as the case where a BIT simply detects, isolates and indicates faults but does not obstruct or influence the system operation (i.e., the BIT only improves system maintainability), we can simply treat the BIT as an additional subsystem on a series reliability analysis model. That is, the BIT or serveral BITs and the primary system form a series reliability model. Mission reliability and basic reliability models are virtually identical [10]. ### 2.3 The Need for a Bayesian Processor—False Alarm Filter The major shortcomings of BIT are false alarms and lack of fault coverage, i.e., diagnostic problems. These shortcomings must be recognized as a very complex problem with involves many aspects. Current techniques make it possible to detect a faulty system with high reliability, say greater than 99%, and fail to avoid false alarms. Desensitizing basic BIT circuits for the purpose of reducing false alarms is generally a mistake, because this will increase the probability of missing faults and will mask intermittent. Indeed, we need a breakthrough to solve BIT diagnostic problems, especially false alarms. Before replacing the LRU which is indicated as failed by BIT, maintenance personnel will usually test it again. Then the question is, How can we integrate the multiple test results? Further, how can we incorporate the prior failed data which we may also receive from field statistics? The Bayesian processor allows us to use every piece of information we can get, and obtain the probability of failure after the $n^{th}$ test. The BIT false alarm probability could, therefore, be greatly reduced. #### 2.4 Uncertainty Handling in BIT Diagnosis An expert subsystem can be employed to deal with ambiguous test results, strange symptoms, undefined phenomena and other uncertainties. When a knowledge—based or expert subsystem is incorporated into the integrated BIT diagnosis configurtion, it can handle the uncertainties. Then the BIT system is capable of dealing with uncertainties and of being updated with new expertise and knowledge. Buswell and Sesto [3] applied an expert system to assist in the design and updating of the isolation software. #### 3. BIT Performance Evaluation A rating system is a techinque for describing the characteristics of an object with discrete levels. For failure mode and effects analysis (FMEA) applications, a rating system is used for the evaluation of probabilistic risk assessment (PRA). In this investigation, a rating system in conjunction with the application of semi—quantitative methods have been used for the evaluation of BIT performance. In the literature [8,14], qualitative analysis is presented to assess BIT performance and the effect of BIT diagnosis on system reliability, maintainability, and availability. However, qualitative analysis cannot provide comparable results. Consequently, application of a rating rule transforms the method of analysis into a semi—quantitative form of assessment. The BIT system provides numerous diagnosis parameters as well as other system characteristics. Table 2 gives the rating values $a_1$ through $a_5$ for individual BIT diagnosis parameters. $\alpha$ is the coefficient of importance for the line replaceable unit (LRU) to be tested. | Table 2 | Rating | Values | for | RIT | Diagnosis | <b>Parameters</b> | |----------|---------|--------|-----|-----|-----------|-------------------| | Table 4. | Itating | values | 101 | DIL | Diagnosis | Lalameters | | RAT | ING | -20 | 1 | 2 | 3 | 4 | |------------|----------------------------|-------------------------------------|-------------------------------------|-------------------------------------|--------------------------------------|--------------------------------------| | $\alpha_1$ | $\lambda_B$ | $\frac{\lambda_B}{\lambda_1}$ > 0.5 | $\frac{\lambda_B}{\lambda_1}$ < 0.5 | $\frac{\lambda_B}{\lambda_1}$ < 0.1 | $\frac{\lambda_B}{\lambda_1}$ < 0.05 | $\frac{\lambda_B}{\lambda_1}$ < 0.01 | | $\alpha_2$ | $F_d$ | <.95 <i>α</i> | (.95∼.98)α | (.98∼.99)α | (.99∼.995) <i>α</i> | $(.995 \sim .999)\alpha$ | | $\alpha_3$ | $F_i$ | <.95a | (.95∼.97) <i>α</i> | $(.97 \sim .98)\alpha$ | $(.98 \sim .99)\alpha$ | $(.99 \sim .995)\alpha$ | | $\alpha_4$ | $F_a$ | $>.5\beta$ | $(.3\sim.5)\beta$ | $(.15\sim.3)\beta$ | $(.05 \sim .15)\beta$ | $<.05\beta$ | | $\alpha_5$ | $T_{\scriptscriptstyle d}$ | >.5T <sub>m</sub> | $(.2\sim.5)T_{m}$ | $(.1\sim.2)T_m$ | (.05∼.')T <sub>m</sub> | <.05T <sub>m</sub> | - $*\lambda_1$ is the failure rate of the LRU to be tested. - $*\alpha$ is the importance factor of the LRU to be tested. - $*\beta$ is the ability factor of false alarm coverage. - $*T_m$ is the manual maintenance time. - \* $T_d$ is the detection and isolation time. $\beta$ is the coefficient that represents the ability for false alarm coverage and uncertainty handling. $\alpha$ is given in Table 3, $\beta$ in Eq.(3). To determine the coefficient $\beta$ , we need to consider the ability of the BIT system to recover false alarms and handle uncertainties. Denote the ability of the Bayesian processor by $\alpha_{10}$ and the ability to handle uncertainties by $\alpha_{11}$ . Then $$\beta = \frac{2}{\alpha_{10} + \alpha_{11}} \tag{3}$$ $\alpha_{10}$ and $\alpha_{11}$ will be given in table 4. #### 3.1 Performance Values of the Individual BIT Diagnosis Parameters Each rating value has five scales, -20,1,2,3 and 4. The highest value, 4, represents the highest performance. The lowest value is -20, which implies that this case is unallowable for an individual BIT. The total performance score for an individual BIT will be the summation of the performance values of the individual BIT diagnosis parameters. That is, $$A_I = \sum_{i=1}^{5} \tag{4}$$ The highest score for $A_i$ is 20. We can see that as long as one value takes the lowest value, $-20 A_i$ will be less than zero, or $A_i$ is not feasible, i.e., the BIT is not useful at all for the system. Table 3. Values of Importance Coefficient $\alpha$ | | DESCRIPTION | |---------------|-----------------------------------------------------------------------| | $\alpha=0$ | Failure of the LRU does not affect the system's operation | | $\alpha=.5$ | Failure of the LRU has a minor effect on the system's operation | | $\alpha=.7$ | Failure of the LRU has a significant effect on the system's operation | | $\alpha = .9$ | Failure of the LRU makes the system less functional | | $\alpha=1$ | Failure of the LRU causes catastrophic problems for the system | Table 4. Rating Values for BIT Subsystems | | NOTATION | 1 | 2 | 3 | |---------------|---------------------------------------|---------|-------------|-------------| | $\alpha_6$ | Redundancy Management and Recon- | None | Some | Good | | | figuration Ability | | Ability | | | $\alpha_7$ | System Health Status Monitoring | None | Some | Good | | | | | Ability | | | $\alpha_8$ | Data Transmission Equipment | None | Simple | Completed | | $\alpha_9$ | BIT Subsystem Configuration Style | Central | Distributed | Distributed | | | | | | Central | | $\alpha_{10}$ | Bayesian Processor | None | Moderate | Good | | | | | Performance | Performance | | $\alpha_{11}$ | Expert Subsystem | None | Moderate | Good | | | | | Performance | Performance | | $\alpha_{12}$ | Self Verification Indication Analysis | No | Some of | High | | | No Response to False Indication | Ability | These | Ability | | | | | Abilities | | ### 3.2 Total Performance Values for a BIT Subsystem Insofar as the evaluation of individual BIT parameter performance is relative to the BIT subsystem, the performance and characteristics of the BIT subsystem as a whole must be investigated. Table 4 provides the rating values $\alpha_6$ through $\alpha_{12}$ for the BIT subsystem. Each rating value has 3 scales: 1, 2 and 3, or Low, Medium, and High. The total performance score for a BIT subsystem is $$A_s = \sum_{i=1}^5 \alpha_i \tag{5}$$ The highest score for $A_s$ is 21, while the lowest score for $A_s$ is 7. ### 4. Case Study #### 4.1 CASE I: Total Performance Scores for an Individual BIT In this case, three individual BITs are incorporated into the system with certain values of BIT diagnosis parameters. The parameters are as defined in Tables 1 and 2, and the detailed values for the three individual BITs are given in Table 5. From Table 2, we can clearly find each rating values based on the three individual BITs. Equation (4) gives performance values for the three individual BITs diagnosis parameters. The results of $A_l$ is show in Table 6. Table 5. Diagnosis Parameter Values of the Three Individual BITs | | BIT-1 | BIT-2 | BIT-3 | |-----------------------|---------|---------|---------| | $\lambda_{B}$ | 0.00015 | 0.01350 | 0.09000 | | $\lambda_i$ | 0.15 | 0.15 | 0.15 | | $\lambda_B/\lambda_i$ | 0.001 | 0.090 | 0.600 | | α | 0.9 | 0.9 | 0.9 | | $F_d$ | 0.90 | 0.89 | 0.87 | | $F_i$ | 0.89 | 0.88 | 0.86 | | β | 0.5 | 0.5 | 0.5 | | $F_a$ | 0.01 | 0.05 | 0.20 | | $T_m$ | 120sec. | 120sec. | 120sec. | | $T_{d}$ | 10sec. | 18sec. | 5sec. | -13 Poor Performance 18 Good Preformance $A_I = \sum_{i=1}^5 \alpha_i$ Conclusion BIT-2BIT-3BIT-1**BITs** -202 4 $\alpha_1$ 2 1 4 $\alpha_2$ 2 1 3 $\alpha_3$ 1 3 4 $\alpha_4$ 2 4 3 11 Moderate Performance Table 6. The Results of the Total Performance Scores for the Three Individual BITs Thus, BIT-1 is to be preferred to either BIT-2 and BIT-3, whereas BIT-3 of no use to the system. ## 4.2 CASE II: Total Performance Scores for a BIT Subsystem In this case, two new BIT subsystems are to be incorporated into a primary system. To investigate the performance and characteristics of these two BIT subsystems, we can simply apply to the Rating Values for BIT subsystems. The information of two different BIT designs are given in Table 7. Table 7. Rating Intensity for the Two New BIT Subsystems | | BIT/1 | BIT/2 | |------------------------------------------|---------------|-------------| | Redundancy Management and | Good | Good | | Reconfiguration Ability | | | | System Health Status Monitoring | Some | None | | | Ability | | | Data Transmission Equipment | Completed | Simple | | BIT Subsystem Configuration Style | Distributed | Central | | Bayesian Processor | Good | None | | | Perfromance | | | Expert Subsystem | Moderate | Good | | | Performance | Performance | | Self Verification Indication Analysis No | Sone of These | High | | Response to False Indication | Abilities | Ability | From Equation (5), we can simply calculated the total performance scores for the two new BIT subsystems. The results of $A_s$ is show in Table 8. As result of this, we better apply BIT/1 subsystem to the primary system to get a best performance. Table 8. The Results of the Total Performance Scores for the Two New BIT Subsystems | Rating | BIT/1 | BIT/2 | |----------------------------------------|-------|-------| | $lpha_6$ | 3 | 3 | | $\alpha_7$ | 2 | 1 | | $\alpha_8$ | 3 | 2 | | $lpha_9$ | 2 | 1 | | $a_{10}$ | 3 | 1 | | $a_{11}$ | 2 | 3 | | $a_{12}$ | 2 | 3 | | $A_s = \sum\limits_{i=6}^{12} lpha_i$ | 17 | 14 | #### 5. Conclusions Purely qualitative or quantitative assessment in BIT related system does not help much to compare the performance of individual BIT's or BIT subsystem. Qualitative assessment cannot provide comparable and numerical results. For quantitative assessment, we can simply treat the BIT as an additional subsystem on a series reliability analysis model. However, this is not always an accurate case. It is likely that BIT takes part in system control or decision making; therefore, the effect of BIT diagnosis on system mission reliability must also be considered. A Rating system is created to evaluated the BIT performance assessment qualitatively and quantitatively—so called semi—quantitative assessment. This is a technique to describe the characteristics of an object with discrete levels—Rating rules provide the possibility to compare the performances of different BIT designs numerically. Case studies are included to provide a clear example of the application of rating rules. Since the BIT includes a large number and types of diagnosis parameters (Table 1), it is useful to simply extend the rating rule to various new situations from time—to—time. ## REFERENCES - [1] Albert, J. et al., "Built-In-Test Verification Techniques," Annual R & M Symposium, pp.252-257, 1986. - [2] Bozic, S. and Shaw, L., "Is BIT a Toy, Blessing or Annoyance," Annual R & M Symposium, pp.270-275, 1985. - [3] Buswell, S.M. and Sesto, P.A., "An Expert System to Facilitate Fault Isolation," *Annual R & M Symposium*, pp.74-81, 1989. - [4] Carroll, W.H. et al., "Diagnostics Specification—A proposed Approach," *Annual R & M Symposium*, pp.227—231, 1981. - [5] Daugherty, G. and Steinmetz, G., "BIT Blueprint Toward More Effective Built In-Test," Annual R & M Symposium, pp.353-360, 1990. - [6] Gleason, D., "Analysis of Built—in Test Accuracy," Annual R & M Symposium, pp.370-372, 1982. - [7] Irwing, M.H., "Built in Test-Past Mistakes, Present Problems and Future Solutions," *Reliability Engineering*, 15(4), pp.245-261, 1986. - [8] Lamberson, L.R. and Shao, J., "Built—In—Test Technology in Commercial Systems," *Proceedings of International Industrial Engineering Conference*, Toronto, Canada, pp.357—362, 1989. - [9] Lord, D.H. and Gleason, D., "Design & Evaluation Methodology for Built—In—Test," *IEEE Trans.* R-30, pp.222-226, 1981. - [10] MIL-STD-756B, MILITARY STANDARD, Reliability Modeling and Prediction, pp.101-1, 1981. - [11] Palazzo, C.J. and Gleason, D., "Avionics Built—In Test Effectiveness and Life Cycle Cost," AIAA-83-2448, pp.1-6, 1983. - [12] Rosin, A., "An Approch to the Selection of Built—In—Test Devices," Annual R & M Symposium, pp.346—350, 1990. - [13] Shao, J. and Lamberson, L.R., "Impact of BIT Design Parameters on Systems RAM," *Reliability Engineering and System Safety*, Vol.23, pp.219-246, 1988. - [14] Shao, J. and Yoo, W., "Effect of Self—Test on the Reliability of an Automatic Control system," *Proceedings of China—Japan International Symposium on Instrumentation, Measurement and Automatic Control*, Beijing, 1989.