Search | Korea Science

Speech Recognition Using Noise Robust Features and Spectral Subtraction (잡음에 강한 특징 벡터 및 스펙트럼 차감법을 이용한 음성 인식)

Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee;Seo, Young-Joo
- The Journal of the Acoustical Society of Korea
- /
- 제15권5호
- /
- pp.38-43
- /
- 1996
This paper compares the recognition performances of feature vectors known to be robust to the environmental noise. And, the speech subtraction technique is combined with the noise robust feature to get more performance enhancement. The experiments using SMC(Short time Modified Coherence) analysis, root cepstral analysis, LDA(Linear Discriminant Analysis), PLP(Perceptual Linear Prediction), RASTA(RelAtive SpecTrAl) processing are carried out. An isolated word recognition system is composed using semi-continuous HMM. Noisy environment experiments usign two types of noises:exhibition hall, computer room are carried out at 0, 10, 20dB SNRs. The experimental result shows that SMC and root based mel cepstrum(root_mel cepstrum) show 9.86% and 12.68% recognition enhancement at 10dB in compare to the LPCC(Linear Prediction Cepstral Coefficient). And when combined with spectral subtraction, mel cepstrum and root_mel cepstrum show 16.7% and 8.4% enhanced recognition rate of 94.91% and 94.28% at 10dB.
PDF

On the speaker's position estimation using TDOA algorithm in vehicle environments (자동차 환경에서 TDOA를 이용한 화자위치추정 방법)

Lee, Sang-Hun;Choi, Hong-Sub
- Journal of Digital Contents Society
- /
- 제17권2호
- /
- pp.71-79
- /
- 2016
This study is intended to compare the performances of sound source localization methods used for stable automobile control by improving voice recognition rate in automobile environment and suggest how to improve their performances. Generally, sound source location estimation methods employ the TDOA algorithm, and there are two ways for it; one is to use a cross correlation function in the time domain, and the other is GCC-PHAT calculated in the frequency domain. Among these ways, GCC-PHAT is known to have stronger characteristics against echo and noise than the cross correlation function. This study compared the performances of the two methods above in automobile environment full of echo and vibration noise and suggested the use of a median filter additionally. We found that median filter helps both estimation methods have good performances and variance values to be decreased. According to the experimental results, there is almost no difference in the two methods' performances in the experiment using voice; however, using the signal of a song, GCC-PHAT is 10% more excellent than the cross correlation function in terms of the recognition rate. Also, when the median filter was added, the cross correlation function's recognition rate could be improved up to 11%. And in regarding to variance values, both methods showed stable performances.
https://doi.org/10.9728/dcs.2016.17.2.71 인용 PDF KSCI

HMM-based Speech Recognition using FSVQ, Fuzzy Concept and Doubly Spectral Feature (FSVQ, 퍼지 개념 및 이중 스펙트럼 특징을 이용한 HMM에 기초를 둔 음성 인식)

정의봉
- Journal of the Korea Computer Industry Society
- /
- 제5권4호
- /
- pp.491-502
- /
- 2004
In this paper, we propose a HMM model using FSVQ(First Section VQ), fuzzy theory and doubly spectral feature, as study on the isolated word recognition system of speaker-independent. In the proposed paper, LPC cepstrum coefficients and regression coefficients of LPC cepstrum as doubly spectral feature be used. And, training data are divided several section and first section is generated codebook of VQ, and then is obtained multi-observation sequences by order of large propabilistic values based on fuzzy nile from the codebook of the first section. Thereafter, this observation sequences of first section is trained and is recognized a word to be obtained highest probaility by same concept. Besides the speech recognition experiments of proposed method, we experiment the other methods under the equivalent environment of data and conditions. In the whole experiment, it is proved that the proposed method is superior to the others in recognition rate.
PDF

Comparison of Characteristic Vector of Speech for Gender Recognition of Male and Female (남녀 성별인식을 위한 음성 특징벡터의 비교)

Jeong, Byeong-Goo;Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- 제16권7호
- /
- pp.1370-1376
- /
- 2012
This paper proposes a gender recognition algorithm which classifies a male or female speaker. In this paper, characteristic vectors for the male and female speaker are analyzed, and recognition experiments for the proposed gender recognition by a neural network are performed using these characteristic vectors for the male and female. Input characteristic vectors of the proposed neural network are 10 LPC (Linear Predictive Coding) cepstrum coefficients, 12 LPC cepstrum coefficients, 12 FFT (Fast Fourier Transform) cepstrum coefficients and 1 RMS (Root Mean Square), and 12 LPC cepstrum coefficients and 8 FFT spectrum. The proposed neural network trained by 20-20-2 network are especially used in this experiment, using 12 LPC cepstrum coefficients and 8 FFT spectrum. From the experiment results, the average recognition rates obtained by the gender recognition algorithm is 99.8% for the male speaker and 96.5% for the female speaker.
https://doi.org/10.6109/jkiice.2012.16.7.1370 인용 PDF KSCI

Automatic Evaluation of Speech and Machine Translation Systems by Linguistic Test Points (자동통번역 시스템의 언어 현상별 자동 평가)

Choi, Sung-Kwon;Choi, Gyu-Hyun;Kim, Young-Gil
- Annual Conference of KIPS
- /
- 한국정보처리학회 2019년도 추계학술발표대회
- /
- pp.1041-1044
- /
- 2019
자동통번역의 성능을 평가하는데 가장 잘 알려진 자동평가 기술은 BLEU이다. 그러나 BLEU로는 자동통번역 결과의 어느 부분이 강점이고 약점인지를 파악할 수 없다. 본 논문에서는 자동통번역 시스템의 언어 현상별 자동평가 방법을 소개하고자 한다. 언어 현상별 자동평가 방법은 BLEU가 제시하지 못하는 언어 현상별 자동평가가 가능하며 개발자로 하여금 해당 자동통번역 시스템의 언어 현상별 강점과 약점을 직관적으로 파악할 수 있도록 한다. 언어 현상별 정확도 측정은 Google 과 Naver Papago 를 대상으로 실시하였다. 정확률이 40%이하를 약점이라고 간주할 때, Google 영한 자동번역기의 약점은 스타일(32.50%)번역이었으며, Google 영한 자동통역기의 약점은 음성(30.00%)인식, 담화(30.00%)처리였다. Google 한영 자동번역기 약점은 구문(34.00%)분석, 모호성(27.50%)해소, 스타일(20.00%)번역이었으며, Google 한영 자동통역기 약점은 담화(30.00%)처리였다. Papago 영한 자동번역기는 대부분 정확률이 55% 이상이었으며 Papago 영한 자동통역기의 약점은 담화(30.00%)처리였다. 또한 Papago 한영 자동번역기의 약점은 구문(38.00%)분석, 모호성(32.50%)해소, 스타일(20.00%)번역이었으며, Google 한영 자동통역기 약점은 담화(20.00%)처리였다. 언어 현상별 자동평가의 궁극적인 목표는 자동통번역기의 다양한 약점을 찾아내어 약점과 관련된 targeted corpus 를 반자동 수집 및 구축하고 재학습을 하여 자동통번역기의 성능을 점증적으로 향상시키는 것이다.
https://doi.org/10.3745/PKIPS.y2019m10a.1041 인용 PDF

A Research on Object Detection Technology for the Visually Impaired (시각장애인을 위한 사물 감지 기술 연구)

Jeong, Yeon-Kyu;Kim, Byung-Gyu;Lee, Jeong-Bae
- The KIPS Transactions:PartB
- /
- 제19B권4호
- /
- pp.225-230
- /
- 2012
In this paper, a blind person using a white cane as an adjunct of the things available sensing technology has been implemented. Sensing technology to implement things ultrasonic sensors and a webcam was used to process the data from the server computer. Ultrasonic sensors detect objects within 4meter people distinguish between those things that if the results based on the results will sound off. In this study, ultrasonic sensors, object recognition and human perception with the introduction of techniques and technologies developed for detecting objects in the lives of the visually impaired is expected to be greater usability.
https://doi.org/10.3745/KIPSTB.2012.19B.4.225 인용 PDF KSCI

ATM 교환기에서의 연결 승인 제어 기법의 비교

박항엽;전치혁;서재준
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 대한산업공학회/한국경영과학회 1994년도 춘계공동학술대회논문집; 창원대학교; 08월 09일 Apr. 1994
- /
- pp.3-11
- /
- 1994
다양한 특성의 트래픽 서비스를 하나의 공통된 망을 통해 고속으로 전송하기위한 B-ISDN의 실용화 방안으로 ATM 기술이 적절한 것으로 인식되어 있다. 하지만 ATM 망에서는 데이타 통신과 같이 셀 손실률이 작아야 하는 트래픽, 음성 서비스와 같이 지연 시간이 문제가 되는 트래픽등 다양한 트래픽 소스들에 의해 서로 다른 서비스 품질 조건을 만족시켜야 하기 때문에 망의 효율적인 이용과 요구된 성능 목표치를 만족시키기 위해서는 여러 측면의 트래픽 기술이 필요하게 된다. 이러한 기술 중의 한 가지인 연결 승인 제어는 각 트래픽 호원들로부터 망으로의 연결이 요청되었을 경우 그 요청을 수락할 것인지의 여부를 결정하는 것인데, 이러한 연결 승인 제어에는 셀 혹은 호에 중점을 두느냐에 따라 여러가지 방법이 있을 수 있다. 본 연구에서는 이질적인 트래픽 호나경에서 비교적 적용이 잘 되는 셀 레벨에서의 한 방법인 적응적 연결 승인 제어를 제안한다. 그리고 적응적 연결 승인 제어의 성능을 시뮬레이션을 통해 분석하여 기존의 연결 승인 제어와의 비교를 통해 성능면에서 다소 좋음을 보여준다.

An Algorithm for extracting English-Korean Transliteration pairs using Automatic I-K Transliteration (자동 음차표기를 이용한 영-한 음차표기 대역쌍의 자동 추출)

오종훈;배선미;최기선
- Proceedings of the Korean Information Science Society Conference
- /
- 한국정보과학회 2004년도 봄 학술발표논문집 Vol.31 No.1 (B)
- /
- pp.928-930
- /
- 2004
지금까지 기계번역과 교차언어 정보검색 등과 같은 자연언어응용에서 사용되는 번역지식을 자동으로 구축하는 연구가 활발히 진행되어 왔다. 번역지식을 자동으로 구축하는 연구는 대역사전에 등재되어 있지 않은 미등록어에 대한 대역정보를 문서에서 자동으로 획득하는 것을 목표로 한다. 최근에는 이러한 미등록어 중 음차표기 번역지식에 대한 연구가 활발히 진행되고 있다. 음차표기는 주로 영어 단어를 발음에 기반하여 비영어권의 언어로 표기하는 것을 의미한다. 음차표기된 단어들은 새로운 개념을 나타내는 신조어가 많기 때문에 사전에 등재되어 있지 않온 경우가 많다. 따라서 효과적인 번역지식 구축을 위해서는 이러한 음차표기 번역지식을 자동으로 획득하는 것은 매우 중요하다. 본 논문에서는 영-한 음차표기 대역쌍을 문서에서 자동으로 추출하는 알고리즘을 제안한다. 본 논문의 기법은 한국어 음차표기의 인식, 영-한 자동음차표기, 한국어 음차표기와 자동음차표기된 영어단어간의 음성적 유사도 비교를 통하여 음차표기 대역쌍을 추출한다. 본 논문의 기법은 약 93%의 정확률과 68%의 재현율을 나타내었다.
PDF

Design of Enhanced Cursor Interface for Low Vision Persons (저 시력인을 위해 개선된 커서 인터페이스의 설계)

Lee, Jong Won;Shon, Jin Gon
- Annual Conference of KIPS
- /
- 한국정보처리학회 2011년도 추계학술발표대회
- /
- pp.1470-1473
- /
- 2011
밀집한 작은 대상으로 구성된 웹과 응용프로그램을 사용할 때, 저 시력인은 원하는 대상을 선택하기 어렵다. 이로 인해 필요로 하는 정보의 접근이 차단되어 사회구성원으로서의 역할을 수행하는 것이 제한된다. 저 시력인을 위한 커서 인터페이스는 대상을 확대하고, 색상을 변화시켜 인식률을 높인다. 그리고 대상 사이의 충분한 거리를 확보하고, 대상의 정보를 음성으로 제공하여 원하지 않는 대상의 선택을 방지한다. 일반적인 환경과 지시확대기의 인터페이스 환경을 제안한 커서 인터페이스와 실험을 통해서 비교하였다. 실험결과에서 제안한 커서 인터페이스가 대상을 선택하는데, 가장 적은 시간이 걸렸다. 제안한 커서 인터페이스를 사용하면, 저 시력인이 웹과 응용프로그램을 쉽게 사용하여 정보의 접근성이 향상된다.
https://doi.org/10.3745/PKIPS.y2011m11a.1470 인용 PDF

Design of CNN-based Braille Conversion and Voice Output Device for the Blind (시각장애인을 위한 CNN 기반의 점자 변환 및 음성 출력 장치 설계)

Seung-Bin Park;Bong-Hyun Kim
- Journal of Internet of Things and Convergence
- /
- 제9권3호
- /
- pp.87-92
- /
- 2023
As times develop, information becomes more diverse and methods of obtaining it become more diverse. About 80% of the amount of information gained in life is acquired through the visual sense. However, visually impaired people have limited ability to interpret visual materials. That's why Braille, a text for the blind, appeared. However, the Braille decoding rate of the blind is only 5%, and as the demand of the blind who want various forms of platforms or materials increases over time, development and product production for the blind are taking place. An example of product production is braille books, which seem to have more disadvantages than advantages, and unlike non-disabled people, it is true that access to information is still very difficult. In this paper, we designed a CNN-based Braille conversion and voice output device to make it easier for visually impaired people to obtain information than conventional methods. The device aims to improve the quality of life by allowing books, text images, or handwritten images that are not made in Braille to be converted into Braille through camera recognition, and designing a function that can be converted into voice according to the needs of the blind.
https://doi.org/10.20465/KIOTS.2023.9.3.087 인용 PDF

검색결과 549건 처리시간 0.069초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)