• Title/Summary/Keyword: 언어획득

Search Result 236, Processing Time 0.027 seconds

A Compound Term Retrieval Model Using Statistical lnformation (통계적 정보를 이용한 복합명사 검색 모델)

  • 박영찬;최기선
    • Korean Journal of Cognitive Science
    • /
    • v.6 no.3
    • /
    • pp.65-81
    • /
    • 1995
  • Compound nouns as a composition of multiple nouns exhibit diverse occurence patterns in the texts and have varying degree of meaning coherence.The problem of compound nouns in information retrieval is to find a method to represent and identify the compositive patterns of each words.This paper explains how the cooccurrence patterns are related with the meaning of each compound noun and the information of such relations that can be mechanically acquired from texts is used in ranking the candidated documents for a given query.The main theme of the paper is that compound nouns can be categorized according to their occurrence patterns of simple nouns and these occurrence patterns can be formalized by statistical analysis without large dictionary or complex compositive rules.Our suggested model achieved about 7.75% improvement over the best precision of the other methods at each recall measurements on Korean test collection.

  • PDF

Comparison of Performance for Korean E-mail Filtering using Bayesian Classifier (한글 전자메일에 대한 베이지언 필터의 성능비교)

  • Lee, Chang-Beom;Kim, Ji-Soo;Kim, Soo-Hyung;Park, Hyuk-Ro
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.214-219
    • /
    • 2004
  • 전자 메일은 매우 많은 사람들이 사용하는 편리하고 효율적인 통신 수단이다. 그러나 전자메일 주소를 쉽게 획득할 수 있다면 점을 악용하기 때문에 사용자가 원하지 않는 메일 즉 스팸 메일에 대한 문제가 심각해지고 있다. 이러한 스팸 메일을 자동으로 분류해주는 스팸 필터는 주로 영어를 대상으로 하고 있으며, 규칙 기반 필터링보다는 통계적 학습을 통한 필터링 방법을 주로 사용하고 있다. 본 논문에서는 베이즈 정리를 기반으로 하는 3가지 분류 알고리즘을 한글 전자메일을 대상으로 하여 스팸 메일 특히 음란성 메일을 분류하는데 있어 그 성능을 평가하고자 한다. 실험 결과, 단어의 스팸일 확률만을 이용하는 방법이 나이브 베이즈 알고리즘이나 m-estimate를 이용하는 방법보다는 성능이 우수함을 알 수 있었다 특히, 단어의 스팸일 확률만을 이용하는 방법은 false positive rate를 0%로 유지하면서도 다른 방법들보다는 필터링을 잘 해내고 있음을 확인할 수 있었다. 그리고, 자질 선정에서는 명사나 명사/형용사를 사용할 경우에 그 에러율이 가장 적었다.

  • PDF

Verb concept clustering using Independent Component Analysis and Box-Cox transformation (독립성분분석과 Box-Cox 변환을 이용한 동사 개념 클러스터링)

  • Chagnaa, Altangerel;Lee, Chang-Beom;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.164-170
    • /
    • 2006
  • 본 논문에서는 한국어 동사의 개념적 클러스터링 방법을 제안하다. 사용되는 기법은 독립성분분석, Box-Cox 변환, 상관분석 등이다. 독립성분분석은 잠재적인 성분을 통계적 독립(statistical independence)에 기반하여 추출하는 분석 방법이다. 그런데, 독립성분분석에서는 mixture(동사)의 분포는 정규 분포(가우시안 분포)에 따른다고 가정한다. 따라서 동사의 분포를 보다 정규 분포화 할 필요가 있다. 이에 본 논문에서는 Box-Cox 변환을 이용하여 동사의 분포를 정규 분포에 근사한다. 또한, 독립성분분석에서는 추출할 적당한 성분의 개수를 결정할 수가 없다. 이에 본 논문에서는 주성분분석의 결과로 획득되는 고유치의 누적 기여율을 이용하여 독립성분의 수를 결정한다. 그리고, 추출된 독립성분 벡터와 동사 벡터간의 상관계수에 이용하여 독립성분(개념)에 밀접하게 관련 있는 동사들을 하나의 클러스터로 구성한다. 한국어 동사를 대상으로 클러스터링한 결과, Box-Cox 변환을 적용한 경우가 더 좋은 성능을 보였다.

  • PDF

Hierarchical Multi-Classifier for the Mixed Character Code Set (홍용 문자 코드 집합을 위한 계층적 다중문자 인식기)

  • Kim, Do-Hyeon;Park, Jae-Hyeon;Kim, Cheol-Ki;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.10
    • /
    • pp.1977-1985
    • /
    • 2007
  • The character recognition technique is one of the artificial intelligence and has been widely applied in the automated system robot HCI(Human Computer Interaction), etc. This paper introduces the character set and the representative character that can be used in the recognition of the mage ROI. The character codes in this ROI include the digit, symbol, English and Hereat etc. We proposed the efficient multi-classifier structure by combining the small-size classifiers hierarchically. Moreover, we generated each small-size classifiers by delta-bar-delta learning algorithm. We tested the performance with various kinds of images and achieved the accuracy of 99%. The proposed multi-classifier showed the efficiency and the reliability for the mixed character code set.

Preliminary Research about Semantic Relations and Linguistic Features in Middle School Students' Writings about Phase Transitions of Water in Air (대기 중 물의 상태변화에 관한 중학생의 글에서 나타나는 의미관계 및 과학 언어적 특성에 관한 예비연구)

  • Jung, Eun-Sook;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.3
    • /
    • pp.288-299
    • /
    • 2010
  • Recently, scientific literacy means not only the acquisition of scientific knowledge but also the linguistic ability to participate in a scientific discourse community. Keeping this in mind, this study investigated middle school students' writings about phase transitions of water in air. Sixty seven students at 9th grade (age 15) students participated in this study and wrote two individual short texts. The result of text analysis can be summarized as follows: (1) students had problems with familiar scientific terms such as 'water vapor' and 'steam' as well as unfamiliar ones like 'dew point'. (2) Students described right semantic relations and at the same time wrong ones more in the idea formed from everyday experience than those from school instruction. (3) While students showed action and process centered writing in text about everyday phenomenon, they showed more preference for technical words and nouns in text about school science. This study suggest that students could develop linguistic ability of science from both spontaneous process based on experience and formal and theoretical learning; the former in forming various semantic relations, the latter in technical and abstract aspect of scientific writing.

Assessment Process Design for Python Programming Learning (파이선(Python) 학습을 위한 평가 프로세스 설계)

  • Ko, Eunji;Lee, Jeongmin
    • Journal of The Korean Association of Information Education
    • /
    • v.24 no.1
    • /
    • pp.117-129
    • /
    • 2020
  • The purpose of this paper is to explore ways to assess computational thinking from a formative perspective and to design a process for assessing programming learning using Python. Therefore, this study explored the computational thinking domain and analyzed research related to assessment design. Also, this study identified the areas of Python programming learning that beginners learn and the areas of computational thinking ability that can be obtained through Python learning. Through this, we designed an assessment method that provides feedback by analyzing syntax corresponding to computational thinking ability. Besides, self-assessment is possible through reflective thinking by using the flow-chart and pseudo-code to express ideas, and peer feedback is designed through code sharing and communication using community.

Aesthetic Implications of the Algorithm Applied to New Media Art Works : A Focus on Live Coding (뉴미디어 예술 작품에 적용된 알고리즘의 미학적 함의 : 라이브 코딩을 중심으로)

  • Oh, Junho
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.119-130
    • /
    • 2013
  • This paper researches the algorithm, whose materiality and expressiveness can be obtained through live coding. Live coding is an improvised genre of music that generates sounds while writing code in real time and projecting it onto a screen. Previous studies of live coding have focused on the development environment to support live coding performance effectively. However, this study examines the aesthetic attitude immanent in the realization of the algorithm through analyzing mostly used languages such as ChucK, Impromtu, and the visualization of live code and cases of "aa-cell" and "slub" performance. The aesthetic attitudes of live coding performance can be divided into algebraic and geometric attitudes. Algebraic attitudes underline the temporal development of concepts; geometric attitudes highlight the materialization of the spatial structure of concepts through image schemas. Such a difference echoes the tension between conception and materiality, which appears in both conceptual and concrete poetry. The linguistic question of whether conception or materiality is more greatly emphasized defines the expressiveness of the algorithm.

Implementation of a System for RFID Education to be based on an EPC global Network Standard (EPC global Network 표준을 따르는 RFID 교육용 시스템의 구현)

  • Kim, Dae-Hee;Chung, Joong-Soo;Kim, Hyu-Chan;Jung, Kwang-Wook;Kim, Seog-Gyu
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.90-99
    • /
    • 2009
  • This paper presents the implementation of RFID EPC global network educational system based on using 900MHz air interface between the reader and the active tag. The software of reader and the active tag is developed on embedded environment, and the software of PC controlling the reader is based on window OS operated as the server. The ATmega128 VLSI chip is used for the processor of the reader and the active tag. As the development environment, AVR compiler is used for the reader and the active tag of which the programming language is C. The visual C++language of the visual studio on the PC activated as the server is used for development language. Main functions of this system are to control tag containing EPC global Data by PC through the reader, to obtain information of tag through the internet and to read/write data on tag memory. Finally the data written from the active tag's memory is sent to the PC via the reader as "read" operation and compare the received data with one already sent to the tag. Software implementation of 900MHz EPC global RFID educational system is done on the basis of these functions.

Educational System Design of RFID/USN (RFID/USN 교육용 시스템의 설계)

  • Kim, Dae-Hee;Oh, Do-Bong;Jung, Joong-soo;Jung, Kwang-wook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.687-692
    • /
    • 2009
  • This paper presents the development of RFID educational system based on 900MHz air interface between the reader and the active tag. The software of reader and the active tag is developed on embedded environment, and the software of PC controlling the reader is based on window OS operated as the server. The ATmega128 VLSI chip is used for the processor of the reader and the active tag. As the development environment, AVR compiler is used for the reader and the active tag of which the programming language is C. The visual C++language of the visual studio on the PC activated as the server is used for development language. Main functions of this system are to control tag containing EPC global Data by PC through the reader, to obtain information of tag through the internet and to read/write data on tag memory. Software design of 900MHz RFID/USN educational system is done on the basis of these functions.

  • PDF

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.