DOI QR코드

DOI QR Code

Extraction of ObjectProperty-UsageMethod Relation from Web Documents

  • Pechsiri, Chaveevan (College of Innovative Technology and Engineering, Dhurakijpundit University) ;
  • Phainoun, Sumran (College of Innovative Technology and Engineering, Dhurakijpundit University) ;
  • Piriyakul, Rapeepun (Dept. of Computer Science, Ramkhamhaeng University)
  • Received : 2017.03.08
  • Accepted : 2017.06.28
  • Published : 2017.10.31

Abstract

This paper aims to extract an ObjectProperty-UsageMethod relation, in particular the HerbalMedicinalProperty-UsageMethod relation of the herb-plant object, as a semantic relation between two related sets, a herbal-medicinal-property concept set and a usage-method concept set from several web documents. This HerbalMedicinalProperty-UsageMethod relation benefits people by providing an alternative treatment/solution knowledge to health problems. The research includes three main problems: how to determine EDU (where EDU is an elementary discourse unit or a simple sentence/clause) with a medicinal-property/usage-method concept; how to determine the usage-method boundary; and how to determine the HerbalMedicinalProperty-UsageMethod relation between the two related sets. We propose using N-Word-Co on the verb phrase with the medicinal-property/usage-method concept to solve the first and second problems where the N-Word-Co size is determined by the learning of maximum entropy, support vector machine, and naïve Bayes. We also apply naïve Bayes to solve the third problem of determining the HerbalMedicinalProperty-UsageMethod relation with N-Word-Co elements as features. The research results can provide high precision in the HerbalMedicinalProperty-UsageMethod relation extraction.

Keywords

References

  1. C. S. G. Khoo and J. C. Na, "Semantic relations in information science," Annual Review of Information Science and Technology, vol. 40, no. 1, pp. 157-228, 2006. https://doi.org/10.1002/aris.1440400112
  2. L. Carlson, D. Marcu, and M. E. Okurowski, "Building a discourse-tagged corpus in the framework of rhetorical structure theory," in Current Directions in Discourse and Dialogue, Dordrecht, Netherlands: Springer, 2003, pp. 85-112.
  3. G. A. Miller, "WordNet: a lexical database for English," Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995. https://doi.org/10.1145/219717.219748
  4. Y. C. Fang, H. C. Huang, H. H. Chen, and H. F. Juan, "TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining," BMC Complementary and Alternative Medicine, vol. 8, no. 58, pp. 1-11, 2008. https://doi.org/10.1186/1472-6882-8-1
  5. A. B. Abacha and P. Zweigenbaum, "Automatic extraction of semantic relations between medical entities: a rule based approach," Journal of Biomedical Semantics, vol. 2 (Suppl 5), pp. S4, 2011.
  6. S. K. Song, H. S. Oh, S. H. Myaeng, S. P. Choi, H. W. Chun, Y. S. Choi, and C. H. Jeong, "Procedural knowledge extraction on MEDLINE," Active Media Technology, Lecture Notes in Computer Science, vol. 6890, pp. 345-354, 2011.
  7. S. Yeleswarapu, A. Rao, T. Joseph, V. G. Saipradeep, and R. Srinivasan, "A pipeline to extract drugadverse event pairs from multiple data sources," BMC Medical Informatics and Decision Making, vol. 14, no. 13, pp. 1-16, 2014. https://doi.org/10.1186/1472-6947-14-1
  8. A. W. Muzaffar, F. Azam, and U. Qamar, "A relation extraction framework for biomedical text using hybrid feature set," Computational and Mathematical Methods in Medicine, vol. 2015, article ID. 910423, 2015.
  9. M. Lafourcade and L. Ramadier, "Semantic relation extraction with semantic patterns: experiment on radiology report," in Proceeding of the 10th LREC 2016 Conference on Language Resources and Evaluation, Portoroz, Slovenia, 2016.
  10. S. J. Kim, Y. H. Lee, and J. H. Lee, "Method of extracting is-a and part-of relations using pattern pairs in mass corpus," in Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 2009, pp. 260-268.
  11. T. M. Mitchell, Machine Learning. Singapore: McGraw-Hill Science, 1997.
  12. S. Sudprasert and A. Kawtrakul, "Thai Word segmentation based on global and local unsupervised learning," in Proceedings of the 7th National Computer Science and Engineering Conference, Chonburi, Thailand, 2003, pp. 1-8.
  13. H. Chanlekha and A. Kawtrakul, "Thai named entity extraction by incorporating maximum entropy model with simple heuristic information," in Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP), Hainan Island, China, 2004, pp. 49-55.
  14. J. Chareonsuk, T. Sukvakree, and A. Kawtrakul, "Elementary discourse unit segmentation for Thai using discourse cue and syntactic information," in Proceedings of the National Computer Science and Engineering Conference, 2005, pp. 85-90.
  15. A. L. Berger, V. J. Della Pietra, and S. A. Della Pietra, "A maximum Entropy approach to natural language processing," Computer Linguistics, vol. 22, no. 1, pp. 39-71, 1996.
  16. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, UK: Cambridge University Press, 2000.