• 제목/요약/키워드: Proper Vocabulary

검색결과 45건 처리시간 0.021초

한국어 방송 뉴스 인식 시스템을 위한 OOV update module (Korean broadcast news transcription system with out-of-vocabulary(OOV) update module)

  • 정의정;윤승
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.33-36
    • /
    • 2002
  • We implemented a robust Korean broadcast news transcription system for out-of-vocabulary (OOV), tested its performance. The occurrence of OOV words in the input speech is inevitable in large vocabulary continuous speech recognition (LVCSR). The known vocabulary will never be complete due to the existence of for instance neologisms, proper names, and compounds in some languages. The fixed vocabulary and language model of LVCSR system directly face with these OOV words. Therefore our Broadcast news recognition system has an offline OOV update module of language model and vocabulary to solve OOV problem and selects morpheme-based recognition unit (so called, pseudo-morpheme) for OOV robustness.

  • PDF

국악공연장(國樂公演場)의 음향성능(音響性能) 평가(評價)를 위한 어휘(語彙)의 유형화(類型化) (Typicality of Vocabulary for Evaluation on Acoustic Performance at Korean classical music performing place)

  • 최둘;주덕훈;김재수
    • 한국주거학회:학술대회논문집
    • /
    • 한국주거학회 2008년도 춘계학술발표대회 논문집
    • /
    • pp.276-280
    • /
    • 2008
  • Korean Classical Music, as the abbreviated wording for 'Korean Music', is being used as the indicating phraseology for our traditional music that distinguishing from Western Music, the foreign music or foreign-styled popular music. Since such Korean Classical Music has the different acoustic characteristics from Western Music, it needs its own performance space for the special exclusive-use of Korean Classical Music. Likewise, even though the demand for the performance space of special exclusive-use for Korean Classical Music where Korean Classical Music is rendering, is on increasing tendency due to the increase of national concern about traditional culture art, since it is being planned without any concrete standard or method that gratifies the supreme listening condition, it is the real situation that a securement of the satisfying acoustic condition is very difficult, after the completion of construction. On such viewpoint, in order to evaluate the acoustic characteristics of the performance space of special exclusive-use for Korean Classical Music, based on the subjective response which reflects human being's psychological attribute at first, this Study has attempted to extract the proper evaluation vocabulary for appraisement on Korean Classical Music. The abstracted vocabulary in such way would be used significantly for Subjective Response Evaluation in order for the evaluation on the Acoustic Characteristics of the performance space of special exclusive-use for Korean Classical Music.

  • PDF

고소음 작업장에서 발생하는 기기소음 평가를 위한 어휘의 유형화 (Typicality of Vocabulary for evaluation on Instrument-Noise generated at Loud Noise Workplace)

  • 주덕훈;국정훈;김재수
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2007년도 추계학술대회논문집
    • /
    • pp.242-247
    • /
    • 2007
  • After the Industrialization of 1960s, while it has greatly contributed to the industrial development owing to acceleration of mechanization, but it is real situation that the countermeasure to Noise Damage generating at the loud noise workshop is scarcely made. Especially, the Instrument-Noise made at factory and workplace is so shocking and repeatedly reiterating terrible noise that most of the spot workers are forcedly imposing such dangers as the severe unpleasant feeling and hearing impairments. On such point of view, this Research has attempted to extract the proper Rating Vocabulary in order for valuation on Instrument Noise made at the terrible noise-workplace, therefore it is considering that those extracted Vocabularies could be utilized as the useful materials for appraisal on Instrument Noise, also for establishment of Regulation-Standard with regard to Acoustic Psychology Experimentation and Instrument Noise.

  • PDF

Semantic Similarity-Based Contributable Task Identification for New Participating Developers

  • Kim, Jungil;Choi, Geunho;Lee, Eunjoo
    • Journal of information and communication convergence engineering
    • /
    • 제16권4호
    • /
    • pp.228-234
    • /
    • 2018
  • In software development, the quality of a product often depends on whether its developers can rapidly find and contribute to the proper tasks. Currently, the word data of projects to which newcomers have previously contributed are mainly utilized to find appropriate source files in an ongoing project. However, because of the vocabulary gap between software projects, the accuracy of source file identification based on information retrieval is not guaranteed. In this paper, we propose a novel source file identification method to reduce the vocabulary gap between software projects. The proposed method employs DBPedia Spotlight to identify proper source files based on semantic similarity between source files of software projects. In an experiment based on the Spring Framework project, we evaluate the accuracy of the proposed method in the identification of contributable source files. The experimental results show that the proposed approach can achieve better accuracy than the existing method based on comparison of word vocabularies.

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • 제9권2호
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.

청감실험에 의한 교통소음 적정 평가어휘 조사에 관한 실험적 연구 (An Experimental study on the Proper Vocabulary for Evaluating Traffic Noise by Psycho-acoustic Experiment)

  • 이주엽;김항;전지현;기노갑;송민정;장길수;김선우
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2004년도 추계학술대회논문집
    • /
    • pp.786-789
    • /
    • 2004
  • For the accurate evaluation of traffic noise with various spectrums and fluctuation characteristics, evaluation systems should reflect not only physical quantities but also the psychological respects of individual persons. In this study, adequate words for evaluating traffic noise have been extracted by reviewing the existing vocabularies and augmenting this with the results of a questionnaire prepared especially for apartment dwellers. As a result of this study, followings are suggested. 1) Vocabularies such as 'disagreeable', 'annoying', 'strident', 'disturbed', 'irritate', 'unpleasant', 'dislike' are classified into the first factor by factor analysis. 2) As a result of surveying overlapping vocabularies for each sound sources, 'noisy', 'annoying', strident', 'unpleasant', 'loudness' are main unpleasant vocabularies to franc noise occurring in our domestic apartment houses.

  • PDF

2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구 (A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness)

  • 노정란
    • 한국문헌정보학회지
    • /
    • 제32권4호
    • /
    • pp.69-86
    • /
    • 1998
  • 본 연구는 2차 법률정보 전문 데이터베이스 구축을 위한 기초연구(권기원, 노정란, 1998, 한국문헌정보학회지, 32(3))에서 밝혀진 법률정보의 특성을 근거로 알고리즘을 개발하고 알고리즘에 의한 모형 통제어 데이터베이스를 구축하여 통제어 색인 시스템과 자연어 색인 시스템의 검색효율을 비교 평가한 것이다. 연구 결과 2차 법률 정보 전문 데이터베이스에서 통제어 색인 시스템은 재현을, 정확률, 자연어 시스템이 검색하지 못한 고유한 적합 문헌을 검색하는 능력에 있어서 자연어 색인시스템보다 높은 효율을 나타내었다. 또한 일반적으로 가중치를 부여하거나 접근점을 추가할 경우 데이터베이스의 정확률이나 재현율의 향상을 가져올 수 있다고 보고 있으나, 2차 법률정보 전문 데이터베이스에서는 법률정보라는 특정 지식 분야의 특성으로 인하여 가중치를 부여하거나 접근점을 추가한 경우에도 재현율과 정확률의 향상을 나타내지 않는다는 사실이 맞혀졌다. 그러므로 정보시스템 설계자는 시스템을 단순히 언어학적, 통계학적 방법으로 접근하기보다는 정보전문가와 주제전문가가 인식하고 있는 각 주제분야의 고유 지식을 시스템에 내장시키는 것이 필요하다고 할 수 있다.

  • PDF

2015 개정 초등과학 교과서의 이독성 분석을 통한 어휘 및 문장 수준에 관한 연구 (A Study on Vocabulary and Sentence Level through Readability Analysis of 2015 Revised Elementary Science Textbook)

  • 윤공민;홍영식
    • 과학교육연구지
    • /
    • 제45권3호
    • /
    • pp.317-325
    • /
    • 2021
  • 본 연구는 2015 개정 초등과학 교과서의 이독성을 분석하여 어휘 및 문장의 수준을 확인하고, 추후 교과서를 집필하는 과정에서 학년별로 적절한 수준의 이독성을 갖는 어휘와 문장을 사용할 수 있는 계기를 마련하는 데 목적이 있다. 이를 위해 2015 개정 초등과학 교과서의 이독성을 측정하고, 과학 용어를 정의하는 문장 및 이해를 돕는 문장의 이독성을 어휘와 문장 수준에서 분석한 후, 학년별 수준 분석과 함께 이전 교과서의 이독성과 비교하였다. 연구 대상의 선정은 연구자를 포함한 교직 경력 10년 이상의 교사 3인의 협의를 거쳐 실시하였다. 분석 결과는 다음과 같다. 첫째, 어휘의 등급 평균은 1.5~2.1 수준으로 초등학생 수준에 적합한 어휘가 사용되고 있었으나 4학년의 경우 4~5등급 어휘가 비교적 높은 비율로 분포되어 있었다. 2015 개정 과학 교과서 3, 6학년 용어 정의 부분의 이독성은 이전 교육과정 과학 교과서의 어휘 이독성보다 낮았지만, 타 교과와는 비슷하거나 낮은 수준을 유지하고 있었다. 둘째, 3, 5학년의 문장 수준은 4, 6학년과 달리 문장 길이가 비교적 길고 단문의 비율이 낮아 문장 수준 이독성이 낮았다. 특히 2015 개정 교육과정 3학년 교과서의 용어 정의 부분의 평균 어휘 수와 단문 비율은 매우 낮은 문장 수준 이독성을 보이고 있어 개선이 필요하다. 셋째, 타 교육과정 교과서의 이독성과 비교할 때, 어휘 수준의 이독성은 적절하지만 3학년 과학 교과서의 경우 문장 당 어휘 수와 복문의 비율이 높아 이독성이 낮았다. 또한 쉬운 어휘의 사용과 함께 문장의 길이를 짧게 하여 이독성을 높이기 위한 노력은 계속되어야 할 것이다.

HMnet Evaluation for Phonetic Environment Variations of Traning Data in Speech Recognition

  • Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권4E호
    • /
    • pp.28-36
    • /
    • 1996
  • In this paper, we propose a new evaluation methodology which can more clearly show the performance of the allophone modeling algorithm generally used in large vocabulary speech recognition. The proposed evaluation method shows the running characteristics and limitations of the modeling algorithm by testing how the variation of phonetic environments of training data affects the recognition performance and the desirable number of free parameters to be estimated. Using the method, we experiment results, we conclude that, in vocabulary-independent recognition task, the phonetic diversity of training data greatly affects the robustness of model, and it is necessary to develop a proper measure which can determine the number of states compromizing the robustness and the precision of the HMnet better than the conventional modeling efficiency.

  • PDF

A Study on the Triphone Replacement in a Speech Recognition System with DMS Phoneme Models

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • 제18권3E호
    • /
    • pp.21-25
    • /
    • 1999
  • This paper proposes methods that replace a missing triphone with a new one selected or created by existing triphones, and compares the results. The recognition system uses DMS (Dynamic Multisection) model for acoustic modeling. DMS is one of the statistical recognition techniques proper to a small - or mid - size vocabulary system, while HMM (Hidden Markov Model) is a probabilistic technique suitable for a middle or large system. Accordingly, it is reasonable to use an effective algorithm that is proper to DMS, rather than using a complicated method like a polyphone clustering technique employed in HMM-based systems. In this paper, four methods of filling missing triphones are presented. The result shows that a proposed replacing algorithm works almost as well as if all the necessary triphones existed. The experiments are performed on the 500+ word DMS speech recognizer.

  • PDF