DOI QR코드

DOI QR Code

Probabilistic Soft Error Detection Based on Anomaly Speculation

  • Yoo, Joon-Hyuk (College of Information and Communication Engineering, Daegu University)
  • Received : 2010.09.06
  • Accepted : 2011.02.13
  • Published : 2011.09.30

Abstract

Microprocessors are becoming increasingly vulnerable to soft errors due to the current trends of semiconductor technology scaling. Traditional redundant multi-threading architectures provide perfect fault tolerance by re-executing all the computations. However, such a full re-execution technique significantly increases the verification workload on the processor resources, resulting in severe performance degradation. This paper presents a pro-active verification management approach to mitigate the verification workload to increase its performance with a minimal effect on overall reliability. An anomaly-speculation-based filter checker is proposed to guide a verification priority before the re-execution process starts. This technique is accomplished by exploiting a value similarity property, which is defined by a frequent occurrence of partially identical values. Based on the biased distribution of similarity distance measure, this paper investigates further application to exploit similar values for soft error tolerance with anomaly speculation. Extensive measurements prove that the majority of instructions produce values, which are different from the previous result value, only in a few bits. Experimental results show that the proposed scheme accelerates the processor to be 180% faster than traditional fully-fault-tolerant processor with a minimal impact on overall soft error rate.

Keywords

Probabilistic Soft Error Detection;Reliability;Anomaly Speculation

Acknowledgement

Supported by : Daegu University

References

  1. T. M. Austin, "DIVA: A reliable substrate for deep submicron microarchitecture design", Proceedings of the 32nd International Symposium on Microarchitecture, November, 1999.
  2. D. Brooks and M. Martonosi, "Dynamically exploiting narrow width operands to improve processorpower and performance", Proceedings of the 5th International Symposium on High Performance Computer Architecture, January, 1999.
  3. T. M. Cover and J. A. Thomas, "Elements of Information Theory", John Wiley and Sons, 1991.
  4. M. A. Gomaa and T. N. Vijaykumar, "Opportunistic transient fault detection", Proceedings of the 32nd International Symposium on Computer Architecture, June, 2005.
  5. R. Gonzalez, A. Cristal, D. Ortega, A. Veidenbaum and M. Valero, "A content aware integer register file organization", Proceedings of the 31st International Symposium on Computer Architecture, June, 2004.
  6. J. L. Hennessy and D. A. Patterson, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann, San Francisco, CA, 2003.
  7. I. P. Hong, H. Y. Jeong and Y. S. Lee, "Physical register sharing through value similarity detection", IEICE Transactions on Information and Systems, E89-D, October, 2006.
  8. J. Hu, S. Wang and S. G. Ziavras, "In-register duplication: Exploiting narrow-width value for improving register file reliability", Proceedings of the 2006 International Conference on Dependable Systems and Networks, June, 2006.
  9. S. Kumar and A. Aggarwal, "Reducing resource redundancy for concurrent error detection techniques in high performance microprocessors", Proceedings of the 12th International Symposium on High-Performance Computer Architecture, February, 2006.
  10. M. H. Lipasti and J. P. Shen, "Exceeding dataflow limit via value prediction", Proceedings of the 29th International Symposium on Microarchitecture, December, 1996.
  11. S. S. Mukherjee, M. Kontz and S. K. Reinhardt, "Detailed design and evaluation of redundant multithreading alternatives", Proceedings of the 29th International Symposium on Computer Architecture, June, 2002.
  12. S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt and T. M. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor", Proceedings of the 36th International Symposium on Microarchitecture, December, 2003.
  13. J. Ray, J. C. Hoe and B. Falsafi, "Dual use of superscalar datapath for transient-fault detection and recovery", Proceedings of the 34th International Symposium on Microarchitecture, December, 2001.
  14. S. K. Reinhardt, "Using the M5 Simulator", ISCA tutorials and workshops, University of Michigan, June, 2005.
  15. S. K. Reinhardt and S. S. Mukherjee, "Transient fault detection via simultaneous multithreading", Proceedings of the 27th International Symposium on Computer Architecture, June, 2000.
  16. E. Rotenberg, "AR-SMT: A microarchitectural approach to fault tolerance in microprocessors", Proceedings of the 29th International Symposium on Fault-Tolerant Computing, June, 1999.
  17. K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan and D. Tarjan, "Temparature-aware microarchitecture", Proceedings of the 30th International Symposium on Computer Architecture, June, 2003.
  18. A. Sodani and G. S. Sohi, "Understanding the differences between value prediction and instruction reuse", Proceedings of the 31st International Symposium on Microarchitecture, December, 1998.
  19. K. Sundaramoorthy, Z. Purser and E. Rotenberg, "Slipstream processors: Improving both performance and fault tolerance", Proceedings of the 33rd International Symposium on Microarchitecture, December, 2000.
  20. T. N. Vijaykumar, I. Pomeranz and K. Cheng, "Transient-fault recovery using simultaneous multithreading", Proceedings of the 29th International Symposium on Computer Architecture, June, 2002.
  21. N. J. Wang and S. J. Patel, "Restore: Symptom based soft error detection in microprocessors", Proceedings of the International Conference on Dependable Systems and Networks, June, 2005.
  22. J. Yoo and M. Franklin, "Hierarchical Verification for Increasing Performance in Reliable Processors", Journal of Electronic Testing: Theory and Applications, Vol.24, No.1-3, Springer, June, 2008.