• Title/Summary/Keyword: Lexicon

Search Result 273, Processing Time 0.02 seconds

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF

Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation

  • Kim, Jung-Ho;Kim, Najoung;Park, Hancheol;Park, Jong C.
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.3
    • /
    • pp.95-101
    • /
    • 2016
  • In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally under-resourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.

A Recognition Time Reduction Algorithm for Large-Vocabulary Speech Recognition (대용량 음성인식을 위한 인식기간 감축 알고리즘)

  • Koo, Jun-Mo;Un, Chong-Kwan;,
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.3
    • /
    • pp.31-36
    • /
    • 1991
  • We propose an efficient pre-classification algorithm extracting candidate words to reduce the recognition time in a large-vocabulary recognition system and also propose the use of spectral and temporal smoothing of the observation probability to improve its classification performance. The proposed algorithm computes the coarse likelihood score for each word in a lexicon using the observation probabilities of speech spectra and duration information of recognition units. With the proposed approach we could reduce the computational amount by 74% with slight degradation of recognition accuracy in 1160-word recognition system based on the phoneme-level HMM. Also, we observed that the proposed coarse likelihood score computation algorithm is a good estimator of the likelihood score computed by the Viterbi algorithm.

  • PDF

Developing a Korean sentiment lexicon through label propagation (레이블 전파를 통한 감정사전 제작)

  • Park, Ho-Min;Cheon, Min-Ah;Nam-Goong, Young;Choi, Min-Seok;Yoon, Ho;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.91-94
    • /
    • 2018
  • 감정분석은 텍스트에서 나타난 저자 혹은 발화자의 태도, 의견 등과 같은 주관적인 정보를 추출하는 기술이며, 여론 분석, 시장 동향 분석 등 다양한 분야에 두루 사용된다. 감정분석 방법은 사전 기반 방법, 기계학습 기반 방법 등이 있다. 본 논문은 사전 기반 감정분석에 필요한 한국어 감정사전 자동 구축 방법을 제안한다. 본 논문은 영어 감정사전으로부터 한국어 감정사전을 자동으로 구축하는 방법이며, 크게 세 단계로 구성된다. 첫 번째는 영한 병렬말뭉치를 이용한 영한사전을 구축하는 단계이고, 두 번째는 영한사전을 통한 이중언어 그래프를 생성하는 단계이며, 세 번째는 영어 단어의 감정값을 한국어 단어의 감정값으로 전파하는 단계이다. 본 논문에서는 제안된 방법의 유효성을 보이기 위해 사전 기반 한국어 감정분석 시스템을 구축하여 평가하였으며, 그 결과 제안된 방법이 합리적인 방법임을 확인할 수 있었으며 향후 연구를 통해 개선한다면 질 좋은 한국어 감정사전을 효과적인 방법으로 구축할 수 있을 것이다.

  • PDF

A Review of Minimum Data Sets and Standardized Nursing Classifications (보건의료정보 자료 세트의 비교 및 간호정보 표준화에 대한 고찰)

  • Yom Young-Hee;Lee Ji-Soon;Kim Hee-Kyung;Chang Hae-Kyung;Oh Won-Ok;Choi Bo-Kyung;Park Chang-Sung;Chun Sook-Hee;Lee Jung-Ae
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.5 no.1
    • /
    • pp.72-85
    • /
    • 1999
  • The paper presents a review of three data sets(Uniform Hospital Discharge Data Set, Nursing Minimum Data Set, and Nursing Management Minimum Data Set) and six major nursing classifications(the North American Nursing Diagnoses Association Taxonomy I, Omaha System, Nursing Interventions Classification, Nursing Intervention Lexicon and Taxonomy, Nursing Outcome Classification, Nursing Outcomes Classification, and Classification of Patient Outcome). The reviewed data sets and nursing classifications were different from each other in the purpose, structure, and user. Nursing Interventions Classification and Nursing Outcomes Classification were linked to North American Nursing Diagnosis Association, but others not. The data set and nursing classifications need to be linked to other data sets and classifications.

  • PDF

Improvement and Evaluation of the Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS) (한국어 음성인식 플랫폼(ECHOS)의 개선 및 평가)

  • Kwon, Suk-Bong;Yun, Sung-Rack;Jang, Gyu-Cheol;Kim, Yong-Rae;Kim, Bong-Wan;Kim, Hoi-Rin;Yoo, Chang-Dong;Lee, Yong-Ju;Kwon, Oh-Wook
    • MALSORI
    • /
    • no.59
    • /
    • pp.53-68
    • /
    • 2006
  • We report the evaluation results of the Korean speech recognition platform called ECHOS. The platform has an object-oriented and reusable architecture so that researchers can easily evaluate their own algorithms. The platform has all intrinsic modules to build a large vocabulary speech recognizer: Noise reduction, end-point detection, feature extraction, hidden Markov model (HMM)-based acoustic modeling, cross-word modeling, n-gram language modeling, n-best search, word graph generation, and Korean-specific language processing. The platform supports both lexical search trees and finite-state networks. It performs word-dependent n-best search with bigram in the forward search stage, and rescores the lattice with trigram in the backward stage. In an 8000-word continuous speech recognition task, the platform with a lexical tree increases 40% of word errors but decreases 50% of recognition time compared to the HTK platform with flat lexicon. ECHOS reduces 40% of recognition errors through incorporation of cross-word modeling. With the number of Gaussian mixtures increasing to 16, it yields word accuracy comparable to the previous lexical tree-based platform, Julius.

  • PDF

Phenomenological References : Arguments for Mentalistic Natural Language Semantics

  • Jun, Jong-Sup
    • Language and Information
    • /
    • v.8 no.2
    • /
    • pp.113-130
    • /
    • 2004
  • In a prevailing view of meaning and reference (cf. Frege 1892), words pick out entities in the physical world by virtue of meaning. Linguists and philosophers have argued whether the meaning of a word is inside or out-side language users' mind; but, in general, they have taken it for granted that words refer to entities in the physical world. Hilary Putnam (1975), based on his famous twin-earth thought experiment, argued that the meaning of a word could not be inside language users' head. In this paper, I point out that Putnam's argument makes sense only if words refer to entities in the physical world. That is, Putnam did not provide any argument against mentalistic semantics, since he erroneously assumed that meaning, but not reference, was inside our mind in mentalistic semantics. Mentalistic semanticist, however, assume that words pick out their references inside our head (instead of a possible outside world). A number of arguments for the mentalistic position come from psychology: studies on emotion and visual perception provide numerous cases where words cannot pick out entities from the physical world, but inside our head. The mentalistic theory has desirable consequences for the philosophy of language in that some classical puzzles of language (e.g. Russell's (1919) well-known puzzle of excluded middle) are explained well in the proposed theory.

  • PDF

The Event Structure of Korean Unaccusative Verbs (한국어 비대격 동사의 사건구조)

  • 이준규;이정민
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2000.05a
    • /
    • pp.108-113
    • /
    • 2000
  • 자동사의 두 하위부류, 비대격(unaccusative) 동사와 비능격 (unergative)동사는 Perlmutter(1978)의 비대격 가설 (Unaccusative Hypothesis) 이후 여러 관점에서 활발히 노의 되어왔다. 한국어에서는 사건구조적 측면에서 두 부류가 차이를 보이며, 이런 사실은 인간의 인지작용과 밀접한 관련을 맺는다. 사건구조를 과정(process)사건과 상태(state)로 가정할 때 비능격 동사는 과정사건이, 비대격 동사는 상태사건이 부각된다. 비대격 동사도 두 가지 부류로 나뉠 수 있는데, '도착하다'처럼 과정사건이 언어표현에서 중시되지 않고 결과적인 상태부분만 중요시 되는 유형(unacc_type_1)과 '녹다'처럼 과정사건도 중시되는 사건 구조를 지닌 유형(unacc_type_2)이다. 결국 비대격 동사는 결과상태를 중시하는 사건구조를 중요시 하지만 과정사건의 지각 정도에 따라 다른 양상을 보인다. 한편 비대격 동사는 사동사와도 밀접한 연관 관계를 지닌다. 많은 논의에서 비대격/사동의 교체를 논리적 다의어로 보고 분석을 시도해 왔다. 따라서 사동사를 중심으로 분석한 경우와 비대격 동사를 중심으로 분석한 경우가 있다. 본고에서는 사동분석(causative analysis)은 한국어 기술에는 적절치 않다고 판단한다. 사동분석에서 도입하는 행동주의 사건유발부분이 반드시 비대격 동사의 표현에 필수적인 것은 아니기 때문이다. 끝으로 Pustejovsky(1995)의 생성어휘부(Generative Lexicon) 이론을 한국어에 맞게 확장·수정한 이정민·강범모·남승호(1997)의 모형에 따라 두 가지 유형의 비대격 동사의 어휘 의미구조를 표상한다.

  • PDF

English visual word recognition of Korean: lexical access and word length effect (한국인의 영어단어 재인과정:어휘접근과 단어길이효과)

  • 이윤형;최원일;정유진;남기춘
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2000.05a
    • /
    • pp.279-284
    • /
    • 2000
  • 시각적으로 제시된 영어 단어 재인시에 주로 단어빈도와 단어길이가 영향을 준다고 알려져 있다. 그러나, 단어빈도와 관련된 연구는 체계적으로 이루어져 왔지만 단어길이와 관련된 연구는 체계적으로 이루어지지 않은 편이다. 또한, 단어빈도와 단이길이에 따라 단어가 성 어휘집(mental lexicon)에 어떻게 표상되어 있으며, 상호간에 어떠한 관계가 있는 것인지에 대해서는 아직 구체적으로 알려져 있지 않다. 본 연구의 목적은 첫째, 단어길이와 빈도가 시각적으로 제시된 영어단어 어휘접근에 어떠한 영향을 미치는지 알아보아 단어길이효과가 어휘접근단계에서 영향을 미치는지 알아보고자 하며 둘째, 단어길이와 빈도가 미국인과 한국인의 어휘접근시 어떤 차이를 보이는지 알아보아 한국인과 미국인의 영어단어 정보처리의 차이를 살펴보고자 하는 것이다. 단어 명명과제와 어휘판단과제를 사용한 실험결과 한국인과 미국인에게 모두 단어 길이와 빈도가 어휘접근에 영향을 주었다. 그러나, 한국인의 경우는 상대적으로 어휘판단과제에서 보다는 단어명명과제에서 어려움을 겪는다는 결과를 보여주었다. 이와 같은 결과를 볼 때 한국인이 영어단어 어휘에 접근할 때에도 미국인과 유사한 방식으로 처리를 하는 것으로 보인다. 그러나, 한국인의 경우는 미국인보다 조음과정에 상대적으로 더 어려움을 느끼는 것으로 보이며, 이것은 영어교육시 단순한 어휘암기보다 음운부호를 산출하고 단어를 말하는 능력을 향상시키는 방법을 좀 더 강조해야 한다는 것을 시사한다.

  • PDF

Multi-Topic Sentiment Analysis using LDA for Online Review (LDA를 이용한 온라인 리뷰의 다중 토픽별 감성분석 - TripAdvisor 사례를 중심으로 -)

  • Hong, Tae-Ho;Niu, Hanying;Ren, Gang;Park, Ji-Young
    • The Journal of Information Systems
    • /
    • v.27 no.1
    • /
    • pp.89-110
    • /
    • 2018
  • Purpose There is much information in customer reviews, but finding key information in many texts is not easy. Business decision makers need a model to solve this problem. In this study we propose a multi-topic sentiment analysis approach using Latent Dirichlet Allocation (LDA) for user-generated contents (UGC). Design/methodology/approach In this paper, we collected a total of 104,039 hotel reviews in seven of the world's top tourist destinations from TripAdvisor (www.tripadvisor.com) and extracted 30 topics related to the hotel from all customer reviews using the LDA model. Six major dimensions (value, cleanliness, rooms, service, location, and sleep quality) were selected from the 30 extracted topics. To analyze data, we employed R language. Findings This study contributes to propose a lexicon-based sentiment analysis approach for the keywords-embedded sentences related to the six dimensions within a review. The performance of the proposed model was evaluated by comparing the sentiment analysis results of each topic with the real attribute ratings provided by the platform. The results show its outperformance, with a high ratio of accuracy and recall. Through our proposed model, it is expected to analyze the customers' sentiments over different topics for those reviews with an absence of the detailed attribute ratings.