• Title/Summary/Keyword: part of speech

Search Result 433, Processing Time 0.021 seconds

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection (자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상)

  • Hwang, Young-Sook;Chung, Hoo-jung;Park, So-Young;Kwak, Young-Jae;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

Statistical Information of Korean Dictionary to Construct an Enormous Electronic Dictionary (대용량 전자사전 구축을 위한 국어 대사전의 통계 정보)

  • Kim, Cheol-Su;Kim, Yang-Beom
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.6
    • /
    • pp.60-68
    • /
    • 2007
  • There are various application areas of Language information processing such as information retrieval, morphological analysis, spell checker, voice recognition, character recognition, etc. In these language information processing areas, an electronic dictionary is essential. This thesis made researches on basic statistical information on the Korean dictionary and on the construction of electronic dictionary. The targets of analysis were the number of registered word in Korea dictionary, the entry number of registered word in electronic dictionary, the number of used syllables, the number of different syllables, the average length of entry, the distribution of part of speech and the number of used nodes to construct electronic dictionary using Trie, except for words including a archaic word or incomplete syllables. Total entry number of electronic dictionary is 361,980, the number of used syllables is 1,289,659, the average length of entries is 3.56 and the number of different syllables is 2,463. Theses informations would play a beneficial role in constructing an electronic dictionary and in processing Korean information.

Specialists' Views Concerning the Assessment, Evaluation, and Programming System (AEPS) in Associations for Children with Disabilities in Saudi Arabia

  • Munchi, Khiryah S.;Bagadood, Nizar H.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.91-100
    • /
    • 2022
  • To support early intervention, it is necessary to develop programming system tools that enable accurate, valid, and reliable assessments and can help achieve reasonable, generalizable, and measurable goals. This study examined the Assessment, Evaluation, and Programming System (AEPS) used by associations of children with disabilities in Saudi Arabia to assess its suitability for children with intellectual disabilities. A group of 16 specialists with different professional backgrounds (including special education, physiotherapy, occupational therapy, speech therapy and psychology) from 11 associations of children with disabilities took part in semi-structured personal interviews. The study concluded that AEPS is generally suited for use with children with intellectual disabilities. However, its suitability depends on the type and severity of the child's disability. The more severe the disability, the less effective the AEPS is likely to be. On the basis of this finding the researchers formed interdisciplinary teams to organise and integrate the children's learning and assess the benefits of AEPS, including its accuracy and ability to achieve adaptive, cognitive, and social targets, enhance family engagement and learning and develop basic development skills. This study also identified obstacles associated with the use of AEPS. These include the lack of comprehensiveness and accuracy of the goal, lack of precision and non-applicability to large movements and the fact that it cannot be used with all children with intellectual disabilities. In addition, the research showed that non-cooperation within the family is a major obstacle to the implementation of the AEPS. The results of this study have several implications.

A Postprocessing Method of Korean Character Recognition by Mis-recognized Morphology Presumption (오인식 형태소 추정에 의한 한국어 문자 인식 후처리 기법)

  • Kim, Young-Hun;Lee, Young-Hwa;Lee, Sang-Jo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.7
    • /
    • pp.46-55
    • /
    • 1999
  • We proposed the new method of postprocessing which not only reduces the frequency of dictionary access using morphological analysis but improve the recognition rate of character recognizer. In this paper, after estimating morphological construction of mis-recognized word using the part of speech that is analyzed, correct presumed mis-recognized morphology. The postprocessing using a morphology unit reduce candidate because of short than word and frequency of dictionary access because there is no need to morphological analysis for candidate. To select right candidate is only necessary to dictionary access. The proposed results show that reduced the frequency of dictionary access to 60% than postprocessing method using a word unit and recognition rate improved from 94% to 97%.

  • PDF

A study on performance improvement of neural network using output probability of HMM (HMM의 출력확률을 이용한 신경회로망의 성능향상에 관한 연구)

  • Pyo Chang Soo;Kim Chang Keun;Hur Kang In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2000
  • In this paper, the hybrid system of HMM and neural network is proposed and show better recognition rate of the post-process procedure which minimizes the process error of recognition than that of HMM(Hidden Markov Model) only used. After the HMM training by training data, testing data that are not taken part in the training are sent to HMM. The output probability from HMM output by testing data is used for the training data of the neural network, post processor. After neural network training, the hybrid system is completed. This hybrid system makes the recognition rate improvement of about $4.5\%$ in MLP and about $2\%$ in RBFN and gives the solution to training time of conventional hybrid system and to decrease of the recognition rate due to the lack of training data in real-time speech recognition system.

  • PDF

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

A Study of Digital Library Service Records and User Privacy (디지털도서관서비스기록과 이용자프라이버시에 관한 연구)

  • Noh, Young-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.3
    • /
    • pp.187-214
    • /
    • 2012
  • Libraries are founded to ensure the intellectual freedom of citizens, and citizens have the right to confidentiality regarding their needs, information access, and information use. Protecting users' privacy is critical to safeguarding their freedom of speech, freedom of thought, and freedom of assembly. Libraries and librarians should seriously concern themselves with their users' privacy because protecting this privacy is part of their most important mission, and, in doing so, users can truly enjoy their intellectual freedom. This study extensively investigated and analyzed the possibility of privacy invasion that may occur in libraries. As a result, cases of potential invasion of privacy in libraries were summarized in the following three categories: violations occurring in the process of national or law agencies' enforcement operations; violations occurring in the process of routine library services such as circulation, reference, online searching etc.; and violations occurring by outsourcing library services.

Syntactic and Semantic Disambiguation for Interpretation of Numerals in the Information Retrieval (정보 검색을 위한 숫자의 해석에 관한 구문적.의미적 판별 기법)

  • Moon, Yoo-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.8
    • /
    • pp.65-71
    • /
    • 2009
  • Natural language processing is necessary in order to efficiently perform filtering tremendous information produced in information retrieval of world wide web. This paper suggested an algorithm for meaning of numerals in the text. The algorithm for meaning of numerals utilized context-free grammars with the chart parsing technique, interpreted affixes connected with the numerals and was designed to disambiguate their meanings systematically supported by the n-gram based words. And the algorithm was designed to use POS (part-of-speech) taggers, to automatically recognize restriction conditions of trigram words, and to gradually disambiguate the meaning of the numerals. This research performed experiment for the suggested system of the numeral interpretation. The result showed that the frequency-proportional method recognized the numerals with 86.3% accuracy and the condition-proportional method with 82.8% accuracy.

Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec (Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류)

  • Kim, Dowoo;Koo, Myoung-Wan
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.742-747
    • /
    • 2017
  • In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.

Creative Talent for Fusion-Positive Collective Intelligence-based Collaborative Learning Content Research ; Focusing on the tvN Connective Lecture Show 'Creation Club 199' (창의 융합인재 양성을 위한 집단지성기반 협력학습 콘텐츠 연구: tvN의 커넥티브(connective) 강연쇼 '창조클럽 199'를 중심으로)

  • Iem, Yun-Seo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.2
    • /
    • pp.529-541
    • /
    • 2015
  • Collaborative learning of collective intelligence-based model is also ideal in higher education did not yet consensus still in the theoretical level. To become collective intelligence-based collaborative learning is to mobilize the competence of the various members should be promoted as much as possible with their own services designed to actively participate in and contribute to the goals of the joint. Is still based collaborative learning model of collective intelligence, which does the actual model is not developed in education is a key program in creative fusion judge called talent. The evolution of the main features of the house just in shaping the content of a modern lecture geureohagi need to check from time to time to see and pay attention. As part of this study, attempts were associated with the tvN planning and attention to trying connector Executive Lecture show "Creative Club 199" content. Well oriented intention to converge the needs of the times, but it is even more compelling naeeotda implement the collective intelligence based on 'how' the reality is that together with the participants.