• Title/Summary/Keyword: 언어모형

Search Result 391, Processing Time 0.027 seconds

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

A Development of Transport Choice Models using Fuzzy Approximate Reasoning Methods (퍼지근사추론을 이용한 교통수단 선택모형 구축)

  • 원제무;손기복
    • Journal of Korean Society of Transportation
    • /
    • v.16 no.1
    • /
    • pp.99-110
    • /
    • 1998
  • 본 연구에서는 인간의 판단과 유산한 구조를 갖는 퍼지근사추론모형(FARM)을 구축하여 교통수단 선택형태에 적용하고자 하였다. 이를 위해 먼저 근사추론모형의 이론적 배경을 살펴보고 버스와 지하철간의 수단선택 모형을 구축하였다. 입력변수로 버스와 지하철간의 총통행시간의 차이와 총통행비용의 차이를 선정하였으며 출력변수로 버스이용확률을 사용하였다. 각 변수에 대한 퍼지집합은 각각 5개씩의 언어적 인 표현으로 구성하였으며, 규칙은 총 25개로 설정하였다, 구축된모형의 현실적 타당성을 검토하기 위해 서 실제 조사자료와 비교하였다. 분석결과 본 연구에서 구축된 퍼지근사추론모형이 통행자들의 수단선택 행태를 현실적으로 설명하는 것으로 나타났다.

  • PDF

A BPN model for Web-based Business Process Modeling (웹기반 비즈니스 프로세스 명세를 위한 BPN 모형)

  • 최상수;이강수
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.05d
    • /
    • pp.971-976
    • /
    • 2002
  • 최근 대부분의 정보시스템은 웹기반 정보시스템으로 이주하고 있으며 이의 개발과 유지보수시에 '웹 위기' 현상이 발생하고 있다. 이를 해결하기 위한 웹엔지니어링 기술 중 웹기반 비즈니스 프로세스 명세 기술이 필요하다. 따라서 본 논문에서는 웹기반 비즈니스 프로세스 명세를 위한 BPN(Business Process Net) 모형을 제시한다. BPN 모형은 베타분포형 확률 패트리넷이며 수행가능형 Activity Diagram이라 할 수 있다. BPN을 모형화할 때 Use Case 분석을 이용하며, 비즈니스 프로세스의 수행 시간 및 비용적 불확실성은 베타분포를 이용하고 있다. BPN 모형은 XML 기반 비즈니스 프로세스 명세언어를 위한 공통 명세모형으로 이용될 수 있다.

  • PDF

Comparison of Predictive Performance between Verbal and Visuospatial Memory for Differentiating Normal Elderly from Mild Cognitive Impairment (정상 노인과 경도인지장애의 감별을 위한 언어 기억과 시공간 기억 검사의 예측 성능 비교)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.6
    • /
    • pp.203-208
    • /
    • 2020
  • This study examined whether Mild Cognitive Impairment (MCI) is related to the reduction of specific memory among linguistic memory and visuospatial memory, and to identify the most predictive index for discriminating MCI from normal elderly. The subjects were analyzed for 189 elderly (103 healthy elderly, 86 MCI). The verbal memory was used by the Seoul Verbal Learning Test. visuospatial memory was measured using the Rey Complex Figure Test. As a result of multiple logistic regression, verbal memory and visuospatial memory showed significant predictive performance in discriminating MCI from normal elderly. On the other hand, when all the confounding variables were corrected, including the results of each memory test, the predictive power was significant in distinguishing MCI from normal aging only in the immediate recall of verbal memory, and the predictive power was not significant in the immediate recall of visuospatial memory. This result suggests that delayed recall of visuospatial memory and immediate recall of verbal memory are the best combinations to discriminate memory ability of MCI.

A Longitudinal Analysis of Factors Affecting Language Development in Infants (영아의 언어발달 영향요인에 관한 종단 분석)

  • Kim, Minseok;Hu, Yunyun;Wang, Wenhui
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.3
    • /
    • pp.457-465
    • /
    • 2019
  • The purpose of this study is to identify factors that affect the language development of infants. For the analysis, data of three years from the first year (2008) to the third year (2010) of the 'Panel Study on Korean Children (PSKC)' were constructed and panel analysis was conducted. The subjects were 2,150 infants who participated in the questionnaire, and the language development of the infants was measured using the communication scores of the K-ASQ test provided by the Korean children's panel. In addition, the factors influencing the language development of infants derived from previous studies were introduced into the model. As a result of the analysis, it is shown that the fixed effect model with fixed individual error of the panel is suitable through the Hausman test. The higher the cognitive development level of the infant, the more positive parenting behavior of the infant, respectively. The conclusions and suggestions about the characteristics of the parents and the parents affecting language development were introduced.

Enhancement of SATEEC GIS system using ArcP (ArcPy를 이용한 SATEEC모델의 개선)

  • Lee, Gwanjae;Shin, Yongchul;Jung, Younghun;Lim, Kyoung Jae
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.515-515
    • /
    • 2015
  • 토양유실량을 산정하기 위한 모델로 Universsal Soil Loss Equation(USLE)가 전 세계적으로 가장 많이 사용되고 있다. USLE 모형은 농경지에서 면상침식(Sheet erosion)과 세류침식(Rill erosion)을 모의할 수 있는 시험포단위 모형(Field-scale)으로 농경지에서 유실된 토양이 하류 하천으로 얼마나 흘러 들어가 하류 수계의 탁수발생과 이에 따른 수질악화에 얼마나 기여하는지, 즉, 유역단위의 토양유실량을 평가하는데 이용될 수 없다. 이러한 단점을 극복하기 위하여 Sediment Assessment Tool for Effective Erosion Control (SATEEC) ArcView 시스템이 개발되어 사용되고 있다. SATEEC ArcView 시스템은 USLE모형의 입력자료와 DEM만으로 유역면적에 따른 유달률을 산정하여 유역에서 유실된 토양이 얼마만큼 하류로 유달되는지를 모의할 수 있으며, 유역 경사도에 의한 유달률도 산정할 수 있어 지형적인 특성을 좀 더 다양하게 분석 할 수 있게 개발 되었다. 그러나 ArcView는 출시한지 오래되어 사용자가 많지 않고, 프로그램상의 오류가 많고, 대용량의 데이터 처리가 가능한 64비트 운영체제에서는 설치가 불가능한 단점이 있다. 또한, ArcView의 프로그래밍 언어인 Avenue는 클래스를 정의한다거나 상속을 한다거나 하는 문법을 제공하지 않기 때문에 객체지향 언어로 보기에는 부족하다고 할 수 있다. 또한, 최근의 ArcGIS 기반의 많은 모델들이 서로 연계하여 사용하고 있으나, Avenue는 기타 다른 프로그래밍 언어와 연계하여 사용하기가 쉽지 않은 단점이 있다. 그러나 최근 ArcGIS 버전들의 프로그래밍 언어인 Python은 간결하고 확장성이 좋으며, 다른 언어와의 연계가 쉽다. 또한, ArcGIS 10.x버전부터 제공되는 arcpy 모듈은 사용자와의 접근성이 매우 향상되었다. 따라서 SATEEC ArcView 버전을 ArcGIS 10.1 기반의 Python 으로 재개발하여 기존의 불편한 접근성과 대용량 데이터의 처리가 불가능했던 부분을 개선하였다.

  • PDF

Characteristics of Social Interaction in Scientific Modeling Instruction on Combustion in Middle School (연소의 과학적 모형형성 수업에서 나타난 중학생의 사회적 상호작용 특징)

  • Park, HyunJu;Kim, HyeYeong;Jang, Shinho;Shim, Youngsook;Kim, Chan-Jong;Kim, Heui-Baik;Yoo, Junehee;Choe, Seung-Urn;Park, Kyung-Mee
    • Journal of the Korean Chemical Society
    • /
    • v.58 no.4
    • /
    • pp.393-405
    • /
    • 2014
  • The purpose of this study was to investigate the characteristics of social interaction on cultural aspects, verbal interaction, and discussion maps in scientific modeling instruction on combustion in middle school. Revised-CLEQ (Cultural Learning Environment Questionnaire), verbal interaction framework, discussion maps analysis were implemented for this study. The results of study follow respectively. First, the characteristics on cultural aspects of middle school students showed cooperation rather than competition in terms of collectivism. Students' attitudes to learning science tended to depend on others' idea, and students were passive and reluctant to present their comments when they were modeling work. Second, for the characteristics of verbal interaction, they simply presented knowledge related to build a model. The response to comments and feedback was relatively few. Third, discussion maps showed a lot of interaction for reponses that was alternatively brought the concept of commenters, which students were depended on a specific one, and responses. There were not many interaction for elaboration, which were described to have new presented ideas, corrections, and reasons when they exchange their ideas. In this study the model type of interaction affecting the formation of the students understanding of and in our country, it makes a lot of social interactions as the basis for teaching system could be utilized.

Text Undestanding System for Summarization (텍스트 이해 모델에 기반한 정보 검색 시스템)

  • Song, In-Seok;Park, Hyuk-Ro
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.1-6
    • /
    • 1997
  • 본 논문에서는 인지적 텍스트 이해 모형을 제시하고 이에 기반한 자동 요약 시스템을 구현하였다. 문서는 정보의 단순한 집합체가 아닌 정형화된 언어 표현 양식으로서 단어의 의미적 정보와 함께 표현 양식, 문장의 구조와 문서의 구성을 통해 정보를 전달한다. 요약 목적의 텍스트 이해 및 분석 과정을 위해 경제 분야 기사 1000건에 대한 수동 요약문을 분석, 이해 모델을 정립하였고. 경제 분야 기사 1000건에 대한 테스트 결과를 토대로 문장간의 관계, 문서의 구조에서 요약 정보 추출에 사용되는 정보를 분석하였다. 본 텍스트 이해 모형은 단어 빈도수에 의존하는 통계적 모델과 비교해 볼 때, 단어 간의 관련성을 찾아내고, 문서구조정보에 기반한 주제문 추출 및 문장간의 관계를 효과적으로 사용함으로서 정보를 생성한다. 그리고 텍스트 이해 과정에서 사용되는 요약 지식과 구조 분석정보의 상관관계를 체계적으로 연결함으로서 자동정보 추출에서 야기되는 내용적 만족도 문제를 보완한다.

  • PDF

The Effect of Amount and Interaction Styles of Maternal Inputs on Early Vocabulary Acquisition : A Longitudinal Multilevel Modeling Perspective (어휘 습득에서 어머니의 언어적 입력의 양과 상호작용 유형의 영향 : 다층 모형의 적용)

  • Chang-Song, You-Kyung;Hong, Sehee;Lee, Keunyoung
    • Korean Journal of Child Studies
    • /
    • v.28 no.5
    • /
    • pp.109-126
    • /
    • 2007
  • A sample of 322 18-month-old infants and their mothers were assessed longitudinally at 24 and 30 months. Maternal utterances and styles of linguistic interaction were measured during a 10 minute free play session. Mothers completed a vocabulary checklist for infants. Longitudinal data were analyzed by multilevel modeling. Results indicated that vocabulary increased with age of infants and the growth rate was highly predictable by the size of vocabulary at 18 months. The growth rate was strongly influenced by maternal questioning and feedback. The effect of the maternal linguistic input was constant with age. Gender differences in size of vocabulary did not vary systematically with age.

  • PDF

The Statistical Relationship between Linguistic Items and Corpus Size (코퍼스 빈도 정보 활용을 위한 적정 통계 모형 연구: 코퍼스 규모에 따른 타입/토큰의 함수관계 중심으로)

  • 양경숙;박병선
    • Language and Information
    • /
    • v.7 no.2
    • /
    • pp.103-115
    • /
    • 2003
  • In recent years, many organizations have been constructing their own large corpora to achieve corpus representativeness. However, there is no reliable guideline as to how large corpus resources should be compiled, especially for Korean corpora. In this study, we have contrived a new statistical model, ARIMA (Autoregressive Integrated Moving Average), for predicting the relationship between linguistic items (the number of types) and corpus size (the number of tokens), overcoming the major flaws of several previous researches on this issue. Finally, we shall illustrate that the ARIMA model presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, corpus representativeness and linguistic comprehensiveness.

  • PDF