Search | Korea Science

HMM-based Speech Recognition using DMS Model and Double Spectral Feature (DMS 모델과 이중 스펙트럼 특징을 이용한 HMM에 의한 음성 인식)

Ann Tae-Ock
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.7 no.4
- /
- pp.649-655
- /
- 2006
This paper proposes a HMM-based recognition method using DMSVQ(Dynamic Multi-Section Vector Quantization) codebook by DMS model and double spectral feature, as a method on the speech recognition of speaker-independent. LPC cepstrum parameter is used as a instantaneous spectral feature and LPC cepstrum's regression coefficient is used as a dynamic spectral feature These two spectral features are quantized as each VQ codebook. HMM using DMS model is modeled by receiving instantaneous spectral feature and dynamic spectral feature by input. Other experiments to compare with the results of recognition experiments using proposed method are implemented by the various conventional recognition methods under the equivalent environment of data and conditions. Through the experiment results, it is proved that the proposed method in this paper is superior to the conventional recognition methods.
PDF

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
- Phonetics and Speech Sciences
- /
- v.12 no.4
- /
- pp.81-90
- /
- 2020
This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.
https://doi.org/10.13064/KSSS.2020.12.4.081 인용 PDF KSCI

Use of Voice Script For Speech Characterization (화법에 의한 성격표현에 활용할 소리대본 작성법)

Lee, Ki-Ho
- The Journal of the Korea Contents Association
- /
- v.11 no.12
- /
- pp.976-985
- /
- 2011
The purpose of this research is to investigate the usage of voice scripting for speech characterization. The ultimate goal of acting is for an actor to create one's character both in physical and vocal sense, and to present them on stage. Toward this goal, actors train themselves with various methods and techniques as well as character analysis. Most of their efforts are put into for better physical and vocal expression. The vocal characterization on stage is heavily governed by the proper speech based on the effective respiration and voice. This paper provides the way of how to use sound scoring for effective vocal characterization on stage.
https://doi.org/10.5392/JKCA.2011.11.12.976 인용 PDF KSCI

A Study on Security Technology in Next Generation Network (차세대 통신망에서의 보안 기술에 관한 연구)

Lee, Keun-Ho;Yi, Song-Hee;Kim, Jeong-Beom;Kim, Tai-Yun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11b
- /
- pp.1135-1138
- /
- 2002
최근 인터넷 관련 기술이 급속하게 발진하고 있다. 과거의 단순한 데이터 서비스에서 음성, 화상 등의 다양한 멀티미디어 서비스를 제공하고 있다. 모든 미디어가 인터넷으로 수렴되는 NGN(Next Generation Network)으로 발진되어 가고 있다. 개방형 네트워크는 다양한 유무선 통합망의 융합화에 따른 통신망간의 간섭이 증가하고 네트워크 접속점 중심의 통신망간 접속구조가 확대되어 지금까지의 시스템 보안 위주의 단순한 보안 기술을 적용하기가 어려웠다. 따라서 네트워크 노드간을 효율적으로 보호하는 네트워크 중심의 보안 기술이 필요한 시점이다. 이에 본 논문은 진화망을 중심으로 하는 통신 산업의 유 무선 데이터 서비스 증가로 원래 데이터 서비스를 위하여 설계된 것에 다양한 데이터 응용 서비스의 하나로 진화 서비스를 수용학 수 있는 새로운 통신 인프라를 구축하여 통합하는 차세대 통신망에 대해서 살펴보고 차세대 통신망(NGN)에서의 보안기술을 연구한다.
PDF

Application and Technology of Voice Synthesis Engine for Music Production (음악제작을 위한 음성합성엔진의 활용과 기술)

Park, Byung-Kyu
- Journal of Digital Contents Society
- /
- v.11 no.2
- /
- pp.235-242
- /
- 2010
Differently from instruments which synthesized sounds and tones in the past, voice synthesis engine for music production has reached to the level of creating music as if actual artists were singing. It uses the samples of human voices naturally connected to the different levels of phoneme within the frequency range. Voice synthesis engine is not simply limited to the music production but it is changing cultural paradigm through the second creations of new music type including character music concerts, media productions, albums, and mobile services. Currently, voice synthesis engine technology makes it possible that users input pitch, lyrics, and musical expression parameters through the score editor and they mix and connect voice samples brought from the database to sing. New music types derived from such a development of computer music has sparked a big impact culturally. Accordingly, this paper attempts to examine the specific case studies and the synthesis technologies for users to understand the voice synthesis engine more easily, and it will contribute to their variety of music production.
PDF KSCI

Real-time Implementation of the G.729 Annex A Using ARM9 $Thumb^{\circledR}$ Processor Core (ARM9 $Thumb^{\circledR}$ 프로세서 코어를 이용한 G.729A의 실시간 구현)

성호상;이동원
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.7
- /
- pp.63-68
- /
- 2001
This paper describes the details of ITU-T SGIS G.729A speech coder implementation using ARM9 Thumb/sup R/ processor core and various techniques used in the optimization process. ITU-T G.729 speech coder is the standard of the toll quality 8 kbit/s speech coding. The input to the speech encoder is assumed to be a 16 bits PCM signal at a sampling rate of 8000 samples per second. G.729A is reduced complexity version of the G.729 coder. This version is bit stream interoperable with the full version. The implemented coder requires 34.8 MIPS for the encoder and 8.1 MIPS for the decoder, 36.5 kBytes of program ROM and 6.3 kBytes of data RAM, respectively. The implemented coder is tested against the set of 9 test vectors provided by ITU-T for bit exact implementation.
PDF

Implementation of Korean Vowel 'ㅏ' Recognition based on Common Feature Extraction of Waveform Sequence (파형 시퀀스의 공통 특징 추출 기반 모음 'ㅏ' 인식 구현)

Roh, Wonbin;Lee, Jongwoo
- KIISE Transactions on Computing Practices
- /
- v.20 no.11
- /
- pp.567-572
- /
- 2014
In recent years, computing and networking technologies have been developed, and the communication equipments have become smaller and the mobility has increased. In addition, the demand for easily-operated speech recognition has increased. This paper proposes method of recognizing the Korean phoneme 'ㅏ'. A phoneme is the smallest unit of sound, and it plays a significant role in speech recognition. However, the precise recognition of the phonemes has many obstacles since it has many variations in its pronunciation. This paper proposes a simple and efficient method that can be used to recognize a Korean vowel 'ㅏ'. The proposed method is based on the common features that are extracted from the 'ㅏ' waveform sequences, and this is simpler than when using the previous complex methods. The experimental results indicate that this method has a more than 90 percent accuracy in recognizing 'ㅏ'.
https://doi.org/10.5626/KTCP.2014.20.11.567 인용

Performance Assessment of Speech Recogniger using Lombard Speech (롬바드 음성을 이용한 음성인식기의 성능 평가)

Jung, Sung-Yun;Chung, Hyun-Yeol;Kim, Kyung-Tae
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.5
- /
- pp.59-68
- /
- 1994
This paper describes the performance assessment test and analysis of test results on a Korean speech recognizer which recognizes Lombard effect received speech in noisy environment, as a basic performance assessment research. In the assessement test, standard speech data were first manipulated close to speech uttered in a noisy environment, and then performance assessment tests were carried out along with the assessment items (the type of noise, SNR) in two ways-one with Lombard effect received speech(LES), the other with not received(NLES). As a result, when 90% of recognition rate is set to be a recognition limit, it was achieved at 10dB SNR point with LES, while at 30dB with NLES. This 20dB of SNR difference indicates Lombard effect should be considered in real world assessment test. The type of noises didn't affect performance of recognizers in out tests. ANOVA analysis, in evaluating several kinds of recognizers, showed every assessment item affecting the recognition performance could be quantified.
PDF

De-identification of Medical Information and Issues (의료정보 비식별화와 해결과제)

Woo, SungHee
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2017.10a
- /
- pp.552-555
- /
- 2017
It is de-identification that emerged to find the trade-off between the use of big data and the protection of personal information. In particular, in the field of medical that deals with various semi-identifier information and sensitive information, de-identification must be performed in order to use medical consultation such as EMR and voice, KakaoTalk, and SNS. However, there is no separate law for medical information protection and legislation for de-identification. Therefore, in this study, we present the current status of de-identification of personal information, the status and case of de-identification of medical information, and finally we provide issues and solutions for medial information protection and de-identification.
PDF

Design and Implementation of Lecture Authoring Tool based on Multimedia Component (멀티미디어 컴포넌트 기반 원격 강의 도구 설계 및 구현)

김재일;정상준;최용준;천성권;김종근
- Journal of Korea Multimedia Society
- /
- v.3 no.5
- /
- pp.516-525
- /
- 2000
A lot of efforts have been made to develop a new technology for efficient distance education based on a powerful Internet service like World Wide Web. By developed distance education systems, we can study something we want at anytime and anywhere. However in spite of the excellent achievements we have made, something insufficient still is remained. Distance education methods only depending on static homepages or voices and simple drawings may be insufficient to support as much effects as in real class. So it is necessary to develop a new method for teaching students more feasibly and efficiently In this paper, we try to design and implement a lecture authoring system that offers effective distance education with voices, animations, camera images, drawings and etc.. We can expect the designed system can give more efficient education effects like a face-to-face lecture in real class. The enclosed function of intense compression enables to transmit the lectures more speedily. Besides it has caption processing function to serve additional informations to students with caption texts, messages and hyperlinks.
PDF

Search Result 301, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)