• Title/Summary/Keyword: 음성인식 후처리

Search Result 131, Processing Time 0.037 seconds

The Flattening Algorithm of Speech Spectrum by Quadrature Mirror Filter (QMF에 의한 음성스펙트럼의 평탄화 알고리즘)

  • Min, So-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.5
    • /
    • pp.907-912
    • /
    • 2006
  • Pre-emphasizing the speech compensates for falloff at high frequencies. The most common form of pre-emphasis is y(n)=s(n)-A${\cdot}$s(n-1), where A typically lies between 0.9 and 1.0 in voiced signal. And, this value reflects the degree of pre-emphasis and equals R(1)/R(0) in conventional method. This paper proposes a new flattening method to compensate the weaked high frequency components that occur by vocal cord characteristic. We used QMF(Quardrature Mirror Filter) to minimize the output signal distortion. After using the QMF to compensate high frequency components, flattening process is followed by R(1)/R(0) at each frame. Experimental results show that the proposed method flattened the weaked high frequency components effectively than auto correlation method. Therefore, the flattening algorithm will apply in speech signal processing like speech recognition, speech analysis and synthesis.

  • PDF

Augmented Reality Logo System Based on Android platform (안드로이드 기반 로고를 이용한 증강현실 시스템)

  • Jung, Eun-Young;Jeong, Un-Kuk;Lim, Sun-Jin;Moon, Chang-Bae;Kim, Byeong-Man
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.181-192
    • /
    • 2011
  • A mobile phone is becoming no longer a voice communication tool due to smartphones and mobile internet. Also, it now becomes a total entertainment device on which we can play game and get services by variety applications through the Web. As smartphones are getting more popular, their usages are also increased, which makes the interest of advertising industry in mobile advertisement increased but it is bound to be limited by the size of the screen. In this paper, we suggest an augmented reality logo system based on Android platform to maximize the effect of logo advertisement. After developing software and mounting it on a real smartphone, its performances are analyzed in various ways. The results show the possibility of its application to real world but it's not enough to provide real time service because of the low performance of hardware.

Augmented Reality Logo System Based on Android platform (안드로이드 기반 로고를 이용한 증강현실 시스템)

  • Lim, Sun-Jin;Jung, Eun-Young;Jeong, Un-Kuk;Jung, Kyoung-Min;Moon, Chang-Bae;Kim, Byeong-Man;Yi, Jong-Yeol
    • Annual Conference of KIPS
    • /
    • 2011.04a
    • /
    • pp.353-356
    • /
    • 2011
  • 스마트 폰의 등장과 모바일 인터넷을 제공함에 따라 휴대폰은 음성통신 수단이 아닌 웹을 통하여 서비스를 제공받는 도구 또는 각종 게임 및 응용 어플리케이션을 제공하는 놀이수단으로도 발전하였고, 이로인하여 사용량도 증가하였다. 사용량의 급증으로 인하여 모바일 광고에 대한 업계의 관심도 증가 하였지만, 한정적인 출력화면에 의하여 제한적일 수밖에 없다. 이를 보완하기 위해, 본 논문에서는 기업의 로고 광고의 효과를 극대화 할 수 있는 안드로이드 기반 로고를 인식하는 증강현실 시스템을 제안 하였고, 이를 구현 하여 실 제폰에 탑재한 후 다양한 성능 분석을 하였다. 실험결과, 그 가능성은 확인하였지만 현하드웨어 성능으로는 실시간으로 지원하기에는 역부족임을 알 수 있었다.

A general-purpose model capable of image captioning in Korean and Englishand a method to generate text suitable for the purpose (한국어 및 영어 이미지 캡션이 가능한 범용적 모델 및 목적에 맞는 텍스트를 생성해주는 기법)

  • Cho, Su Hyun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.8
    • /
    • pp.1111-1120
    • /
    • 2022
  • Image Capturing is a matter of viewing images and describing images in language. The problem is an important problem that can be solved by keeping, understanding, and bringing together two areas of image processing and natural language processing. In addition, by automatically recognizing and describing images in text, images can be converted into text and then into speech for visually impaired people to help them understand their surroundings, and important issues such as image search, art therapy, sports commentary, and real-time traffic information commentary. So far, the image captioning research approach focuses solely on recognizing and texturing images. However, various environments in reality must be considered for practical use, as well as being able to provide image descriptions for the intended purpose. In this work, we limit the universally available Korean and English image captioning models and text generation techniques for the purpose of image captioning.

Classification of Underwater Transient Signals Using MFCC Feature Vector (MFCC 특징 벡터를 이용한 수중 천이 신호 식별)

  • Lim, Tae-Gyun;Hwang, Chan-Sik;Lee, Hyeong-Uk;Bae, Keun-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8C
    • /
    • pp.675-680
    • /
    • 2007
  • This paper presents a new method for classification of underwater transient signals, which employs frame-based decision with Mel Frequency Cepstral Coefficients(MFCC). The MFCC feature vector is extracted frame-by-frame basis for an input signal that is detected as a transient signal, and Euclidean distances are calculated between this and all MFCC feature. vectors in the reference database. Then each frame of the detected input signal is mapped to the class having minimum Euclidean distance in the reference database. Finally the input signal is classified as the class that has maximum mapping rate in the reference database. Experimental results demonstrate that the proposed method is very promising for classification of underwater transient signals.

Development for Estimation Model of Runway Visual Range using Deep Neural Network (심층신경망을 활용한 활주로 가시거리 예측 모델 개발)

  • Ku, SungKwan;Hong, SeokMin
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.435-442
    • /
    • 2017
  • The runway visual range affected by fog and so on is one of the important indicators to determine whether aircraft can take off and land at the airport or not. In the case of airports where transportation airplanes are operated, major weather forecasts including the runway visual range for local area have been released and provided to aviation workers for recognizing that. This paper proposes a runway visual range estimation model with a deep neural network applied recently to various fields such as image processing, speech recognition, natural language processing, etc. It is developed and implemented for estimating a runway visual range of local airport with a deep neural network. It utilizes the past actual weather observation data of the applied airfield for constituting the learning of the neural network. It can show comparatively the accurate estimation result when it compares the results with the existing observation data. The proposed model can be used to generate weather information on the airfield for which no other forecasting function is available.

Development of a Web-based Presentation Attitude Correction Program Centered on Analyzing Facial Features of Videos through Coordinate Calculation (좌표계산을 통해 동영상의 안면 특징점 분석을 중심으로 한 웹 기반 발표 태도 교정 프로그램 개발)

  • Kwon, Kihyeon;An, Suho;Park, Chan Jung
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.2
    • /
    • pp.10-21
    • /
    • 2022
  • In order to improve formal presentation attitudes such as presentation of job interviews and presentation of project results at the company, there are few automated methods other than observation by colleagues or professors. In previous studies, it was reported that the speaker's stable speech and gaze processing affect the delivery power in the presentation. Also, there are studies that show that proper feedback on one's presentation has the effect of increasing the presenter's ability to present. In this paper, considering the positive aspects of correction, we developed a program that intelligently corrects the wrong presentation habits and attitudes of college students through facial analysis of videos and analyzed the proposed program's performance. The proposed program was developed through web-based verification of the use of redundant words and facial recognition and textualization of the presentation contents. To this end, an artificial intelligence model for classification was developed, and after extracting the video object, facial feature points were recognized based on the coordinates. Then, using 4000 facial data, the performance of the algorithm in this paper was compared and analyzed with the case of facial recognition using a Teachable Machine. Use the program to help presenters by correcting their presentation attitude.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

A Statistical Prediction Model of Speakers' Intentions in a Goal-Oriented Dialogue (목적지향 대화에서 화자 의도의 통계적 예측 모델)

  • Kim, Dong-Hyun;Kim, Hark-Soo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.554-561
    • /
    • 2008
  • Prediction technique of user's intention can be used as a post-processing method for reducing the search space of an automatic speech recognizer. Prediction technique of system's intention can be used as a pre-processing method for generating a flexible sentence. To satisfy these practical needs, we propose a statistical model to predict speakers' intentions that are generalized into pairs of a speech act and a concept sequence. Contrary to the previous model using simple n-gram statistic of speech acts, the proposed model represents a dialogue history of a current utterance to a feature set with various linguistic levels (i.e. n-grams of speech act and a concept sequence pairs, clue words, and state information of a domain frame). Then, the proposed model predicts the intention of the next utterance by using the feature set as inputs of CRFs (Conditional Random Fields). In the experiment in a schedule management domain, The proposed model showed the precision of 76.25% on prediction of user's speech act and the precision of 64.21% on prediction of user's concept sequence. The proposed model also showed the precision of 88.11% on prediction of system's speech act and the Precision of 87.19% on prediction of system's concept sequence. In addition, the proposed model showed 29.32% higher average precision than the previous model.

Seasonal and Spatial Distribution of Soft-bottom Polychaetesin Jinju Bay of the Southern Coast of Korea (진주만에서 저서 다모류의 시 · 공간 분포)

  • Kang Chang Keun;Baik Myung Sun;Kim Jeong Bae;Lee Pil Yong
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.35 no.1
    • /
    • pp.35-45
    • /
    • 2002
  • Seasonal quantitative van Veen grab sampling was conducted to characterize the composition and structure of the benthic polychaete community inhabiting the shellfish farming ground of a coastal bay system of Jiniu Bay (Korea). A total of 132 polychaete species were identified and the polychaetes accounted for about $80\%$ of overall abundance of benthic animals. There was little significant seasonal difference in densities (abundances) of polychaetes, Maximum biomass was obseued in summer (August) and minimum value was recorded in winter (February) and spring (May). Conversely, diversity and richness were lowest in summer, indicating a seasonal variability in the polychaetous community structure, The cluster analysis indicated that such a seasonal variability resulted mainly from the appearance of a few small, r-selected opportunists in spring and the tubiculous species of the family Maldanidae in summer. On the other hand, several indicator species for the organically enriched environments such as Capitelia capitata, Notoniashs Jatericeus and hmbrineris sp. showed high densities during all the study period. Density and biomass of univariate measures of community structure were significantly lower in the arkshell-farming ground of the southern area than in the non-farming sites of the bay, A similar general tendency was also found in the spatial distributions of species diversity and richness. Principal component analysis revealed the existence of different groups of benthic assemblages between the arkshell-farming ground and non-farming sites, The lack of colonization of r-selected opportunists and/or tubiculous species in the former ground seemed to contribute to the spatial differences in the composition and structure of the polychaetous communities. Although finer granulometric composition and high sulfide concentration in sediments of the arkshell-farming ground and low salinity in the northern area were likely to account for parts of the differences, other environmental variables observed were unlikely. The spatial distribution of polychaetes in Jiniu Bay may be rather closely related to the sedimentary disturbance by selection of shells for harvesting in spring.