Lessons from Developing an Annotated Corpus of Patient Histories

Rost, Thomas Brox;Huseth, Ola;Nytro, Oystein;Grimsmo, Anders;

doi:10.5626/JCSE.2008.2.2.162

Journal of Computing Science and Engineering

제2권2호
/
Pages.162-179
/
2008
/
1976-4677(pISSN)
/
2093-8020(eISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

DOI QR Code

Lessons from Developing an Annotated Corpus of Patient Histories

Rost, Thomas Brox (Department of Computer and Information Science Norwegian University of Science and Technology) ;
Huseth, Ola (Department of Language and Communication Studies Norwegian University of Science and Technology) ;
Nytro, Oystein (Department of Computer and Information Science Norwegian University of Science and Technology) ;
Grimsmo, Anders (Department of Community Medicine and General Practice Norwegian University of Science and Technology)

발행 : 2008.06.30

https://doi.org/10.5626/JCSE.2008.2.2.162 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

We have developed a tool for annotation of electronic health record (EHR) data. Currently we are in the process of manually annotating a corpus of Norwegian general practitioners' EHRs with mainly linguistic information. The purpose of this project is to attain a linguistically annotated corpus of patient histories from general practice. This corpus will be put to future use in medical language processing and information extraction applications. The paper outlines some of our practical experiences from developing such a corpus and, in particular, the effects of semi-automated annotation. We have also done some preliminary experiments with part-of-speech tagging based on our corpus. The results indicated that relevant training data from the clinical domain gives better results for the tagging task in this domain than training the tagger on a corpus form a more general domain. We are planning to expand the corpus annotations with medical information at a later stage.

키워드

참고문헌

BAKKEN, C. 2006. Fastlegeordningen en suksess. Tidskr Nor Lægeforen, 126(6):814.
BRANTS, S., S. DIPPER, S. HANSEN, W. LEZIUS, AND G. SMITH. 2002. The TIGER Treebank. In Workshop on Treebanks and Linguistic Theories (TLT), Sozopol.
BRANTS, T. 2000. TnT - a statistical part-of-speech tagger. In NAACL/ANLP.
CAMPBELL, D. AND S. JOHNSON. 2001. Comparing syntactic complexity in medical and nonmedical corpora. Proc AMIA Annu Fall Symp, pages 90-94.
EDSBERG, O., Y. NYTRO, AND T. B. ROST. 2007. Novelty detection in patient histories: Experiments with measures based on text compression. In 7th International Symposium on Intelligent Data Analysis, Ljubljana, Slovenia.
EJERHED, E., G. KALLGREN, O. WENNSTEDT, AND M. ASTROM. 1992. The linguistic annotation system of the stockholm-umeå corpus project. Technical report, Umea University.
FISZMAN, M., W. CHAPMAN, S. EVANS, AND P. HAUG. 1999. Automatic identification of pneumonia related concepts on chest x-ray reports. In AMIA Symp, pages 67-71.
GIUSE, D. AND A. MICKISH. 1996. Increasing the availability of the computerized patient record. In AMIA Fall Symp, pages 633-637.
GOLDMAN, J. A., W. W. CHU, D. S. PARKER, AND R. M. GOLDMAN. 1999. Term domain distribution analysis: a data mining tool for text databases. Methods Inf Med, 38(2):96-101. Journal Article.
HAHN, U. AND J. WERMTER. 2004. High-performance tagging on medical texts. In COLING '04: Proceedings of the 20th international conference on Computational Linguistics, page 973, Geneva, Switzerland.
HONIGMAN, B., P. LIGHT, R. M. PULLING, AND D. W. BATES. 2001. A computerized method for identifying incidents associated with adverse drug events in outpatients. Int J Med Inform, 61(1):21-32. Journal Article. https://doi.org/10.1016/S1386-5056(00)00131-3
HRIPCSAK, G., S. BAKKEN, P. STETSON, AND V. PATEL. 2003. Mining complex clinical data for patient safety research: a framework for event discovery. Journal of Biomedical Informatics, 36(1/2):120-130. https://doi.org/10.1016/j.jbi.2003.08.001
HRIPCSAK, G., C. FRIEDMAN, P. O. ALDERSON, W. DUMOUCHEL, S. B. JOHNSON, AND P. D. CLAYTON. 1995. Unlocking clinical data from narrative reports: A study of natural language processing. Ann Intern Med, 122(9):681-688. https://doi.org/10.7326/0003-4819-122-9-199505010-00007
HRIPCSAK, G., G. KUPERMAN, AND C. FRIEDMAN. 1998. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods of Information in Medicine, 37(1):1-7.
HUSETH, O. 2005. Automatisk ordklassetagging og grafem-fonemoversettelse med skjulte markovmodeller.
IEZZONI, L. 1997. Assessing quality using administrative data. Ann Intern Med, 127(8 Pt 2):666-674. https://doi.org/10.7326/0003-4819-127-8_Part_2-199710151-00048
JOHANNESSEN, J. M. B. AND H. HAUGLIN. 1998. An automatic analysis of norwegian compounds. In 16th Scandinavian conference of linguistics, Turku/Abo.
JOHNSON, S. B., S. BAKKEN, D. DINE, S. HYUN, E. MENDONçA, F. MORRISON, T. BRIGHT, T. VAN VLECK, J. WRENN, AND P. STETSON. 2008. An electronic health record based on structured narrative. J Am Med Inform Assoc, 15(1):54-64. https://doi.org/10.1197/jamia.M2131
MACDONALD, C. J. 1997. The barriers to electronic medical record systems and how to overcome them. J Am Med Inform Assoc, 4(3):213-221. https://doi.org/10.1136/jamia.1997.0040213
MARCUS, M. P., B. SANTORINI, AND M. A. MARCINKIEWICZ. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
MURFF, H. J., A. J. FORSTER, J. F. PETERSON, J. M. FISKIO, H. L. HEIMAN, AND D. W. BATES. 2003. Electronically screening discharge summaries for adverse medical events. J Am Med Inform Assoc, 10(4):339-350. Evaluation Studies Journal Article. https://doi.org/10.1197/jamia.M1201
NORDGARD, T. 2000. Norkompleks. a norwegian computational lexicon. In COMLEX-2000, Patras, Greece.
PAKHOMOV, S. V., A. CODEN, AND C. G. CHUTE. 2006. Developing a corpus of clinical notes manually annotated for part-of-speech. International Journal of Medical Informatics, 75(6):418-429. https://doi.org/10.1016/j.ijmedinf.2005.08.006
POWSNER, S. M., J. C. WYATT, AND P. WRIGHT. 1998. Opportunities for and challenges of computerisation. Lancet, 352(9140):1617-1622. https://doi.org/10.1016/S0140-6736(98)08309-3
RABINER, L. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286. https://doi.org/10.1109/5.18626
RAMSHAW, L. AND M. MARCUS. 1995. Text chunking using transformation-based learning. In D. Y. Church and Kenneth, editors, Proceedings of the Third Workshop on Very Large Corpora, pages 82-94. Association for Computational Linguistics, Somerset, New Jersey.
ROST, T. B., O. EDSBERG, A. GRIMSMO, and y. NYTRO. 2007. Comparing medical code usage with the compression-based dissimilarity measure. In 12th World Congress on Health (Medical) Informatics - Building Sustainable Health Systems, Brisbane, Australia.
ROST, T. B., Y. NYTRø, AND A. GRIMSMO. 2006. Classifying encounter notes in the primary care patient record. In B. Stein and O. Kao, editors, Proceedings of the 3rd International Workshop on Text-based Information Retrieval, volume 205, pages 1-5, Riva del Garda, Italy, CEUR-WS.
SHARDA, P., A. K. DAS, T. A. COHEN, AND V. L. PATEL. 2006. Customizing clinical narratives for the electronic medical record interface using cognitive methods. Int J Med Inform, 75(5):346-368. https://doi.org/10.1016/j.ijmedinf.2005.07.027
SKUT, W., T. BRANTS, B. KRENN, AND H. USZKOREIT. 1993. A linguistically interpreted corpus of german newspaper text. In 1st Conference on Linguistic Resources, Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX, pages 705-712, Granada, M. Silberztein.
SPYNS, P. 1996. Natural language processing in medicine: an overview. Methods Inf Med, 35(4-5):285-301, Journal Article Review.
VAN WALRAVEN, C., A. LAUPACIS, R. SETH, AND G. WELLS. 1999. Dictated versus databasegenerated discharge summaries: a randomized clinical trial. CMAJ, 160(3):319-326.
WALSH, S. H. 2004. The clinician's perspective on electronic health records and how they can affect patient care. BMJ, 328:1184-1187. https://doi.org/10.1136/bmj.328.7449.1184
WEED, L. L. 1969. Medical Records, Medical Education and Patient Care. The Problem-Oriented Record as a Basic Tool. Case Western Reserve University Press, Cleveland.

Journal of Computing Science and Engineering

Lessons from Developing an Annotated Corpus of Patient Histories

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)