DOI QR코드

DOI QR Code

Fault Isolation for Linux Device Drivers

  • Received : 2017.02.23
  • Accepted : 2017.03.20
  • Published : 2017.04.28

Abstract

In this paper, we propose a fault isolation system for device drivers of the Linux operating system. High availability systems impose stringent requirements upon Linux operating system. Especially device drivers can be a major source of operating system instability and many times contribute to system degradation and outages. The proposed fault isolation system identifies the occurrence of the memory-related faults in device driver and isolates it from the kernel. By operating at the early stage of the page fault handler in Linux kernel, the system detects which module causes fault and isolates it transparently from the remaining part of the kernel. By experiments, we show that the proposed system efficiently detects faults incurred by device driver, isolates the device driver and the process which accessed the driver module from the kernel.

Keywords

References

  1. M. Mitchell, J. Oldham, and A. Samuel, "Advanced Linux Programming," New Riders Publishing, 2001.
  2. L. Chen and A. Avizienis, "N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation," Proc. of the 25th International Symposium on Fault-Tolerant Computing, pp. 113-119, Pasadena CA, USA, June 1995.
  3. Joel Bartlett, Jim Gray, Bob Host, "Fault Tolerance in Tandem Computer Systems," Tandem Technical Report TR-85.3, 1985.
  4. Stratus, http://www.stratus.com
  5. S. Webber and J. Beirne, "The Stratus Architecture," Proc. of the 21st International Symposium on Fault-Tolerant Computing, pp. 79-85, Montreal, Canada, June 1991.
  6. L. Spainhower and T. A. Gregg, "G4: A Fault-Tolerant CMOS Mainframe," Proc. of the 28th International Symposium on Fault-Tolerant Computing, pp.432-440, Munich, Germany, June 1998.
  7. W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang, "Characterization of Linux Kernel Behavior under Errors," Proc. of the 2003 International Conference on Dependable Systems and Networks, San Francisco, USA, pp. 459-468, June 2003.
  8. L. Matassa, "Device Driver Hardening and Manageability," Intel Corporation White Paper, 2002.
  9. M. M. Swift, M. Annamalai, B. N. Bershad, and H, M. Levy, "Recovering device drivers," ACM Trans. on Computer Systems, Vol. 24, No. 4, pp. 333-360, November 2006. https://doi.org/10.1145/1189256.1189257
  10. J. Choi, S. Baek, and S. Y. Shin, "Design and implementation of a kernel resource protector for robustness of Linux module programming," Proc. of the 2006 ACM Symposium on Applied Computing, pp. 1477-1481, Dijon, France, April 2006.
  11. Carrier Grade Edition, http://www.mvista.com/cge
  12. John Mehaffey, "Montavista Linux Carrier Grade Edition," White Paper of Montavista Software Inc., April 2002.
  13. Dave Edwards and Lori Matassa, "An Approach to Injecting faults into Hardened Software," Proc. of the 2002 Ottawa Linux Symposium, pp. 146-157, Ottawa, Cananda, June 2002.
  14. T. Naughton, W. Bland, G. Vallee, C. Engelmann, and S. L. Scott, "Fault Injection Framework for System Resilience Evaluation," Proc. of Resilience '09, pp.23-28, Munich, Germany, June 2009.
  15. J. Corbet, A. Rubini, and G. Kroah-Hartman, "Linux Device Drivers, 3rd Ed." O'Reilly, 2005.
  16. Daniel P. Bovet and Marco Cesati, "Understanding the Linux Kernel 3rd Ed." O'Reilly, 2006.