DOI QR코드

DOI QR Code

Lessons from Developing an Annotated Corpus of Patient Histories

  • Rost, Thomas Brox (Department of Computer and Information Science Norwegian University of Science and Technology) ;
  • Huseth, Ola (Department of Language and Communication Studies Norwegian University of Science and Technology) ;
  • Nytro, Oystein (Department of Computer and Information Science Norwegian University of Science and Technology) ;
  • Grimsmo, Anders (Department of Community Medicine and General Practice Norwegian University of Science and Technology)
  • 발행 : 2008.06.30

초록

We have developed a tool for annotation of electronic health record (EHR) data. Currently we are in the process of manually annotating a corpus of Norwegian general practitioners' EHRs with mainly linguistic information. The purpose of this project is to attain a linguistically annotated corpus of patient histories from general practice. This corpus will be put to future use in medical language processing and information extraction applications. The paper outlines some of our practical experiences from developing such a corpus and, in particular, the effects of semi-automated annotation. We have also done some preliminary experiments with part-of-speech tagging based on our corpus. The results indicated that relevant training data from the clinical domain gives better results for the tagging task in this domain than training the tagger on a corpus form a more general domain. We are planning to expand the corpus annotations with medical information at a later stage.

키워드

참고문헌

  1. BAKKEN, C. 2006. Fastlegeordningen en suksess. Tidskr Nor Lægeforen, 126(6):814.
  2. BRANTS, S., S. DIPPER, S. HANSEN, W. LEZIUS, AND G. SMITH. 2002. The TIGER Treebank. In Workshop on Treebanks and Linguistic Theories (TLT), Sozopol.
  3. BRANTS, T. 2000. TnT - a statistical part-of-speech tagger. In NAACL/ANLP.
  4. CAMPBELL, D. AND S. JOHNSON. 2001. Comparing syntactic complexity in medical and nonmedical corpora. Proc AMIA Annu Fall Symp, pages 90-94.
  5. EDSBERG, O., Y. NYTRO, AND T. B. ROST. 2007. Novelty detection in patient histories: Experiments with measures based on text compression. In 7th International Symposium on Intelligent Data Analysis, Ljubljana, Slovenia.
  6. EJERHED, E., G. KALLGREN, O. WENNSTEDT, AND M. ASTROM. 1992. The linguistic annotation system of the stockholm-umeå corpus project. Technical report, Umea University.
  7. FISZMAN, M., W. CHAPMAN, S. EVANS, AND P. HAUG. 1999. Automatic identification of pneumonia related concepts on chest x-ray reports. In AMIA Symp, pages 67-71.
  8. GIUSE, D. AND A. MICKISH. 1996. Increasing the availability of the computerized patient record. In AMIA Fall Symp, pages 633-637.
  9. GOLDMAN, J. A., W. W. CHU, D. S. PARKER, AND R. M. GOLDMAN. 1999. Term domain distribution analysis: a data mining tool for text databases. Methods Inf Med, 38(2):96-101. Journal Article.
  10. HAHN, U. AND J. WERMTER. 2004. High-performance tagging on medical texts. In COLING '04: Proceedings of the 20th international conference on Computational Linguistics, page 973, Geneva, Switzerland.
  11. HONIGMAN, B., P. LIGHT, R. M. PULLING, AND D. W. BATES. 2001. A computerized method for identifying incidents associated with adverse drug events in outpatients. Int J Med Inform, 61(1):21-32. Journal Article. https://doi.org/10.1016/S1386-5056(00)00131-3
  12. HRIPCSAK, G., S. BAKKEN, P. STETSON, AND V. PATEL. 2003. Mining complex clinical data for patient safety research: a framework for event discovery. Journal of Biomedical Informatics, 36(1/2):120-130. https://doi.org/10.1016/j.jbi.2003.08.001
  13. HRIPCSAK, G., C. FRIEDMAN, P. O. ALDERSON, W. DUMOUCHEL, S. B. JOHNSON, AND P. D. CLAYTON. 1995. Unlocking clinical data from narrative reports: A study of natural language processing. Ann Intern Med, 122(9):681-688. https://doi.org/10.7326/0003-4819-122-9-199505010-00007
  14. HRIPCSAK, G., G. KUPERMAN, AND C. FRIEDMAN. 1998. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods of Information in Medicine, 37(1):1-7.
  15. HUSETH, O. 2005. Automatisk ordklassetagging og grafem-fonemoversettelse med skjulte markovmodeller.
  16. IEZZONI, L. 1997. Assessing quality using administrative data. Ann Intern Med, 127(8 Pt 2):666-674. https://doi.org/10.7326/0003-4819-127-8_Part_2-199710151-00048
  17. JOHANNESSEN, J. M. B. AND H. HAUGLIN. 1998. An automatic analysis of norwegian compounds. In 16th Scandinavian conference of linguistics, Turku/Abo.
  18. JOHNSON, S. B., S. BAKKEN, D. DINE, S. HYUN, E. MENDONçA, F. MORRISON, T. BRIGHT, T. VAN VLECK, J. WRENN, AND P. STETSON. 2008. An electronic health record based on structured narrative. J Am Med Inform Assoc, 15(1):54-64. https://doi.org/10.1197/jamia.M2131
  19. MACDONALD, C. J. 1997. The barriers to electronic medical record systems and how to overcome them. J Am Med Inform Assoc, 4(3):213-221. https://doi.org/10.1136/jamia.1997.0040213
  20. MARCUS, M. P., B. SANTORINI, AND M. A. MARCINKIEWICZ. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
  21. MURFF, H. J., A. J. FORSTER, J. F. PETERSON, J. M. FISKIO, H. L. HEIMAN, AND D. W. BATES. 2003. Electronically screening discharge summaries for adverse medical events. J Am Med Inform Assoc, 10(4):339-350. Evaluation Studies Journal Article. https://doi.org/10.1197/jamia.M1201
  22. NORDGARD, T. 2000. Norkompleks. a norwegian computational lexicon. In COMLEX-2000, Patras, Greece.
  23. PAKHOMOV, S. V., A. CODEN, AND C. G. CHUTE. 2006. Developing a corpus of clinical notes manually annotated for part-of-speech. International Journal of Medical Informatics, 75(6):418-429. https://doi.org/10.1016/j.ijmedinf.2005.08.006
  24. POWSNER, S. M., J. C. WYATT, AND P. WRIGHT. 1998. Opportunities for and challenges of computerisation. Lancet, 352(9140):1617-1622. https://doi.org/10.1016/S0140-6736(98)08309-3
  25. RABINER, L. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286. https://doi.org/10.1109/5.18626
  26. RAMSHAW, L. AND M. MARCUS. 1995. Text chunking using transformation-based learning. In D. Y. Church and Kenneth, editors, Proceedings of the Third Workshop on Very Large Corpora, pages 82-94. Association for Computational Linguistics, Somerset, New Jersey.
  27. ROST, T. B., O. EDSBERG, A. GRIMSMO, and y. NYTRO. 2007. Comparing medical code usage with the compression-based dissimilarity measure. In 12th World Congress on Health (Medical) Informatics - Building Sustainable Health Systems, Brisbane, Australia.
  28. ROST, T. B., Y. NYTRø, AND A. GRIMSMO. 2006. Classifying encounter notes in the primary care patient record. In B. Stein and O. Kao, editors, Proceedings of the 3rd International Workshop on Text-based Information Retrieval, volume 205, pages 1-5, Riva del Garda, Italy, CEUR-WS.
  29. SHARDA, P., A. K. DAS, T. A. COHEN, AND V. L. PATEL. 2006. Customizing clinical narratives for the electronic medical record interface using cognitive methods. Int J Med Inform, 75(5):346-368. https://doi.org/10.1016/j.ijmedinf.2005.07.027
  30. SKUT, W., T. BRANTS, B. KRENN, AND H. USZKOREIT. 1993. A linguistically interpreted corpus of german newspaper text. In 1st Conference on Linguistic Resources, Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX, pages 705-712, Granada, M. Silberztein.
  31. SPYNS, P. 1996. Natural language processing in medicine: an overview. Methods Inf Med, 35(4-5):285-301, Journal Article Review.
  32. VAN WALRAVEN, C., A. LAUPACIS, R. SETH, AND G. WELLS. 1999. Dictated versus databasegenerated discharge summaries: a randomized clinical trial. CMAJ, 160(3):319-326.
  33. WALSH, S. H. 2004. The clinician's perspective on electronic health records and how they can affect patient care. BMJ, 328:1184-1187. https://doi.org/10.1136/bmj.328.7449.1184
  34. WEED, L. L. 1969. Medical Records, Medical Education and Patient Care. The Problem-Oriented Record as a Basic Tool. Case Western Reserve University Press, Cleveland.