• Title/Summary/Keyword: Proper Vocabulary

Search Result 45, Processing Time 0.021 seconds

Korean broadcast news transcription system with out-of-vocabulary(OOV) update module (한국어 방송 뉴스 인식 시스템을 위한 OOV update module)

  • Jung Eui-Jung;Yun Seung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.33-36
    • /
    • 2002
  • We implemented a robust Korean broadcast news transcription system for out-of-vocabulary (OOV), tested its performance. The occurrence of OOV words in the input speech is inevitable in large vocabulary continuous speech recognition (LVCSR). The known vocabulary will never be complete due to the existence of for instance neologisms, proper names, and compounds in some languages. The fixed vocabulary and language model of LVCSR system directly face with these OOV words. Therefore our Broadcast news recognition system has an offline OOV update module of language model and vocabulary to solve OOV problem and selects morpheme-based recognition unit (so called, pseudo-morpheme) for OOV robustness.

  • PDF

Typicality of Vocabulary for Evaluation on Acoustic Performance at Korean classical music performing place (국악공연장(國樂公演場)의 음향성능(音響性能) 평가(評價)를 위한 어휘(語彙)의 유형화(類型化))

  • Choi, Dool;Ju, Duck-Hoon;Kim, Jae-Soo
    • Proceeding of Spring/Autumn Annual Conference of KHA
    • /
    • 2008.04a
    • /
    • pp.276-280
    • /
    • 2008
  • Korean Classical Music, as the abbreviated wording for 'Korean Music', is being used as the indicating phraseology for our traditional music that distinguishing from Western Music, the foreign music or foreign-styled popular music. Since such Korean Classical Music has the different acoustic characteristics from Western Music, it needs its own performance space for the special exclusive-use of Korean Classical Music. Likewise, even though the demand for the performance space of special exclusive-use for Korean Classical Music where Korean Classical Music is rendering, is on increasing tendency due to the increase of national concern about traditional culture art, since it is being planned without any concrete standard or method that gratifies the supreme listening condition, it is the real situation that a securement of the satisfying acoustic condition is very difficult, after the completion of construction. On such viewpoint, in order to evaluate the acoustic characteristics of the performance space of special exclusive-use for Korean Classical Music, based on the subjective response which reflects human being's psychological attribute at first, this Study has attempted to extract the proper evaluation vocabulary for appraisement on Korean Classical Music. The abstracted vocabulary in such way would be used significantly for Subjective Response Evaluation in order for the evaluation on the Acoustic Characteristics of the performance space of special exclusive-use for Korean Classical Music.

  • PDF

Typicality of Vocabulary for evaluation on Instrument-Noise generated at Loud Noise Workplace (고소음 작업장에서 발생하는 기기소음 평가를 위한 어휘의 유형화)

  • Ju, Duck-Hoon;Kook, Jung-Hun;Kim, Jae-Soo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.242-247
    • /
    • 2007
  • After the Industrialization of 1960s, while it has greatly contributed to the industrial development owing to acceleration of mechanization, but it is real situation that the countermeasure to Noise Damage generating at the loud noise workshop is scarcely made. Especially, the Instrument-Noise made at factory and workplace is so shocking and repeatedly reiterating terrible noise that most of the spot workers are forcedly imposing such dangers as the severe unpleasant feeling and hearing impairments. On such point of view, this Research has attempted to extract the proper Rating Vocabulary in order for valuation on Instrument Noise made at the terrible noise-workplace, therefore it is considering that those extracted Vocabularies could be utilized as the useful materials for appraisal on Instrument Noise, also for establishment of Regulation-Standard with regard to Acoustic Psychology Experimentation and Instrument Noise.

  • PDF

Semantic Similarity-Based Contributable Task Identification for New Participating Developers

  • Kim, Jungil;Choi, Geunho;Lee, Eunjoo
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.228-234
    • /
    • 2018
  • In software development, the quality of a product often depends on whether its developers can rapidly find and contribute to the proper tasks. Currently, the word data of projects to which newcomers have previously contributed are mainly utilized to find appropriate source files in an ongoing project. However, because of the vocabulary gap between software projects, the accuracy of source file identification based on information retrieval is not guaranteed. In this paper, we propose a novel source file identification method to reduce the vocabulary gap between software projects. The proposed method employs DBPedia Spotlight to identify proper source files based on semantic similarity between source files of software projects. In an experiment based on the Spring Framework project, we evaluate the accuracy of the proposed method in the identification of contributable source files. The experimental results show that the proposed approach can achieve better accuracy than the existing method based on comparison of word vocabularies.

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.

An Experimental study on the Proper Vocabulary for Evaluating Traffic Noise by Psycho-acoustic Experiment (청감실험에 의한 교통소음 적정 평가어휘 조사에 관한 실험적 연구)

  • Lee, Ju-Yeob;Kim, Hang;Jun, Ji-Hyun;Gi, No-Gab;Song, Min-Jeong;Jang, Gil-Soo;Kim, Sun-Woo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.786-789
    • /
    • 2004
  • For the accurate evaluation of traffic noise with various spectrums and fluctuation characteristics, evaluation systems should reflect not only physical quantities but also the psychological respects of individual persons. In this study, adequate words for evaluating traffic noise have been extracted by reviewing the existing vocabularies and augmenting this with the results of a questionnaire prepared especially for apartment dwellers. As a result of this study, followings are suggested. 1) Vocabularies such as 'disagreeable', 'annoying', 'strident', 'disturbed', 'irritate', 'unpleasant', 'dislike' are classified into the first factor by factor analysis. 2) As a result of surveying overlapping vocabularies for each sound sources, 'noisy', 'annoying', strident', 'unpleasant', 'loudness' are main unpleasant vocabularies to franc noise occurring in our domestic apartment houses.

  • PDF

A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness (2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구)

  • Roh Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.69-86
    • /
    • 1998
  • The purpose of velop the indexing algorithm of secondary legal information by the study of characteristics of legal information, to compare the indexing system using controlled vocabulary to the indexing system using natural language in the secondary legal information full-text databases, and to prove propriety and superiority of the indexing system using controlled vocabulary. The results are as follows; 1)The indexing system using controlled vocabulary in the secondary legal information full-text databases has more effectiveness than the indexing system using natural language, in the recall rate, the precision rate, the distribution of propriety, and the faculty of searching for the unique proper-records which the indexing system using natural language fans to find 2)The indexing system which adds more words to the controlled vocabulary in the secondary legal information full-text databases does not better effectiveness in the retail rate, the precision rate, comparing to the indexing system using controlled vocabulary. 3)The indexing system using word-added controlled vocabulary with an extra weight in the secondary legal information full-text databases does not better effectiveness in the recall rate, the precision rate, comparing to the indexing system using word-added controlled vocabulary without an extra weight. This study indicates that it is necessary to have characteristic information the information experts recognize - that is to say, experimental and inherent knowledge only human being can have built-in into the system rather than to approach the information system by the linguistic, statistic or structuralistic way, and it can be more essential and intelligent information system.

  • PDF

A Study on Vocabulary and Sentence Level through Readability Analysis of 2015 Revised Elementary Science Textbook (2015 개정 초등과학 교과서의 이독성 분석을 통한 어휘 및 문장 수준에 관한 연구)

  • Yoon, Gong Min;Hong, Young-Sik
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.317-325
    • /
    • 2021
  • The purpose of this study is to analyze the readability of the 2015 revised elementary science textbooks at the vocabulary and sentence levels, and to provide an opportunity to use vocabulary and sentences with an appropriate level of readability for writing textbooks in the future. To do this, the readability of the 2015 revised elementary science textbook was analyzed and the vocabulary and sentence level the readability of sentences defining scientific terminology were analyzed. The results were then compared to the readability of previous curriculum textbooks. The results are as follows: first, the grade average of vocabulary remained at 1.5-2.1, with vocabulary appropriate to the elementary school level being used on average. However, grades 4 to 5 vocabulary are distributed at a relatively high rate. Second, the sentence-level analysis shows that the sentence lengths for the third and fifth grades were relatively long and the percentage of simple sentences was low. Third, compared to other curriculum textbooks, it was confirmed that the proper level of readability was maintained at the vocabulary level, but that the sentence lengths and the percentage of simple sentences could adversely affect the readability of third-grade science textbooks.

HMnet Evaluation for Phonetic Environment Variations of Traning Data in Speech Recognition

  • Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4E
    • /
    • pp.28-36
    • /
    • 1996
  • In this paper, we propose a new evaluation methodology which can more clearly show the performance of the allophone modeling algorithm generally used in large vocabulary speech recognition. The proposed evaluation method shows the running characteristics and limitations of the modeling algorithm by testing how the variation of phonetic environments of training data affects the recognition performance and the desirable number of free parameters to be estimated. Using the method, we experiment results, we conclude that, in vocabulary-independent recognition task, the phonetic diversity of training data greatly affects the robustness of model, and it is necessary to develop a proper measure which can determine the number of states compromizing the robustness and the precision of the HMnet better than the conventional modeling efficiency.

  • PDF

A Study on the Triphone Replacement in a Speech Recognition System with DMS Phoneme Models

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3E
    • /
    • pp.21-25
    • /
    • 1999
  • This paper proposes methods that replace a missing triphone with a new one selected or created by existing triphones, and compares the results. The recognition system uses DMS (Dynamic Multisection) model for acoustic modeling. DMS is one of the statistical recognition techniques proper to a small - or mid - size vocabulary system, while HMM (Hidden Markov Model) is a probabilistic technique suitable for a middle or large system. Accordingly, it is reasonable to use an effective algorithm that is proper to DMS, rather than using a complicated method like a polyphone clustering technique employed in HMM-based systems. In this paper, four methods of filling missing triphones are presented. The result shows that a proposed replacing algorithm works almost as well as if all the necessary triphones existed. The experiments are performed on the 500+ word DMS speech recognizer.

  • PDF