Search | Korea Science

A Study on Improving Speech Recognition Rate (H/W, S/W) of Speech Impairment by Neurological Injury (신경학적 손상에 의한 언어장애인 음성 인식률 개선(H/W, S/W)에 관한 연구)

Lee, Hyung-keun;Kim, Soon-hub;Yang, Ki-Woong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.23 no.11
- /
- pp.1397-1406
- /
- 2019
In everyday mobile phone calls between the disabled and non-disabled people due to neurological impairment, the communication accuracy is often hindered by combining the accuracy of pronunciation due to the neurological impairment and the pronunciation features of the disabled. In order to improve this problem, the limiting method is MEMS (micro electro mechanical systems), which includes an induction line that artificially corrects difficult vocalization according to the oral characteristics of the language impaired by improving the word of out of vocabulary. mechanical System) Microphone device improvement. S/W improvement is decision tree with invert function, and improved matrix-vector rnn method is proposed considering continuous word characteristics. Considering the characteristics of H/W and S/W, a similar dictionary was created, contributing to the improvement of speech intelligibility for smooth communication.
https://doi.org/10.6109/jkiice.2019.23.11.1397 인용 PDF KSCI

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
- Smart Media Journal
- /
- v.12 no.9
- /
- pp.81-88
- /
- 2023
Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.
https://doi.org/10.30693/SMJ.2023.12.9.81 인용 PDF

Optimum Conditions for Improvement of Mechanical and Interfacial Properties of Thermal Treated Pine/CFRP Composites (열처리된 Pine/탄소섬유 복합재료의 기계적 및 계면물성 향상을 위한 최적 조건)

Shin, Pyeong-Su;Kim, Jong-Hyun;Park, Ha-Seung;Baek, Yeong-Min;Kwon, Dong-Jun;Park, Joung-Man
- Composites Research
- /
- v.30 no.4
- /
- pp.241-246
- /
- 2017
The brittle nature in most FRP composites is accompanying other forms of energy absorption mechanisms such as fibre-matrix interface debonding and ply delamination. It could play an important role on the energy absorption capability of composite structures. To solve the brittle nature, the adhesion between pines and composites was studied. Thermal treated pines were attached on carbon fiber reinforced polymer (CFRP) by epoxy adhesives. To find the optimum condition of thermal treatment for pine, two different thermal treatments at 160 and $200^{\circ}C$ were compared to the neat case. To evaluate mechanical and interfacial properties of pines and pine/CFRP composites, tensile, lap shear and Izod test were carried out. The bonding force of pine grains was measured by tensile test at transverse direction and the elastic wave from fracture of pines was analyzed. The mechanical, interfacial properties and bonding force at $160^{\circ}C$ treated pine were highest due to the reinforced effect of pine. However, excessive thermal treatment resulted in the degradation of hemicellulose and leads to the deterioration in mechanical and interfacial properties.
https://doi.org/10.7234/composres.2017.30.4.241 인용 PDF KSCI

Optical signal amplification property in photorefractive Cu-KNSBN crystal (광굴절 Cu-KNSBN 결정에서의 광신호 증폭 특성)

Kim, Sung-Gu;An, Jun-Won;Kim, Nam;Lee, Kwon-Yeon;Seo, Ho-Hyung
- Proceedings of the Optical Society of Korea Conference
- /
- 2000.02a
- /
- pp.288-289
- /
- 2000
SBN, BSKNN KNSBN 등의 tungsten-bronze 계열에 속하는 광굴절 결정은 짧은 파장에서 좋은 감광도와 빠른 응답시간을 갖는다. 이중에서도 KNSBN 결정은 큰 크기의 결정 성장 및 도핑이 용이하고 광굴절 결정에서 중요한 특성 중 하나인 열 안정성(thermal stability)이 좋기 때문에 빠른 응답특성이 요구되는 응용분야에서 촉망받는 매질이다. 본 논문에서는 광정보저장, 광정보처리, 광컴퓨터, 광통신과 같은 다양한 분야에서 응용가능성을 가지는 Cu가 0.04wt.%도핑된 5mm$\times$5mm$\times$5mm 크기의 KNSBN 결정을 이용한 광신호의 증폭기술에 대하여 연구하였다. 먼저 Cu-KNSNB 결정의 2광파 결합 특성을 분석하기 위하여, 기록 파장에 따른 지수이득계수의 외부입사각의존성, 최대 지수이득계수를 나타내는 외부입사각에서 입사빔의 세기비에 따른 2광파 결합 이득을 측정하였다. 또한, 632.8nm파장 영역에서 기록 및 삭제시간 상수, 회절 효율의 입사빔 세기비 의존성을 측정하였다. 그리고, 음향-광학 변조기(AOM: acousto-optic modulator)에 의해 진폭 변조된 신호빔을 이용하여 광신호 증폭특성을 분석하고 그 결과를 제시하였다. 이때 두 빔의 입사각은 최대 지수이득계수를 나타내는 입사반각 12$^{\circ}$로 고정하고, 감쇄기를 이용하여 신호빔의 세기를 조절하면서 신호빔의 차동이득을 측정하였다. 투과된 신호빔은 같은 주파수에서 차동 이득(diffrerential gain)을 보였으며, 이는 moving grating과 시간-변조된 신호빔(또는 펌프빔)사이의 새로운 상호작용은 광굴절 결정의 시간 적분 특성에 의한 것이다. (중략) 경우는 상온에서 펌프 펄스의 유지시간이 0.5% 인 경우 레이저가 동작하는 것을 보여주었다. 이는 구조내에서 열전도가 문제가 된다는 것을 의미하는데 위아래가 공기로 둘러 싸여 있어 발생한 열이 가는 유전체 네트웍을 통해서만 전달 될 수 있기 때문이다. (중략)$^4$A$_2$에 의한 nophonon line R$_1$, R$_2$(680.4, 678.5 nm) 및 $^2$T$_1$$\longrightarrow$$^4$A$_2$(655.7, 649.3, 645.2 nm)의 형광방출 스펙트럼을 얻었으며, 형광수명은 0.264 ms로 조사되었다. 제조된 레이저 발진봉은 직경 6.3 m, 길이 45 nm이었다.\pm$0.06kHz Ge $F_4$; -1.84$\pm$0.04kHz$0.04kHz/TEX>0.04kHz 모국어 및 관련 외국어의 음운규칙만 알면 어느 학습대상 외국어에라도 적용할 수 있는 보편성을 지니는 것으로 사료된다.없다. 그렇다면 겹의문사를 [-wh]의리를 지 닌 의문사의 병렬로 분석할 수 없다. 예를 들어 누구누구를 [주구-이-ν가] [누구누구-이- ν가]로부터 생성되었다고 볼 수 없다. 그러므로 [-wh] 겹의문사는 복수 의미를 지닐 수 없 다. 그러면 단수 의미는 어떻게 생성되는가\ulcorner 본 논문에서는 표면적 형태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$ elements)로 가정한다.
PDF

An Improvement of Stochastic Feature Extraction for Robust Speech Recognition (강인한 음성인식을 위한 통계적 특징벡터 추출방법의 개선)

김회린;고진석
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.2
- /
- pp.180-186
- /
- 2004
The presence of noise in speech signals degrades the performance of recognition systems in which there are mismatches between the training and test environments. To make a speech recognizer robust, it is necessary to compensate these mismatches. In this paper, we studied about an improvement of stochastic feature extraction based on band-SNR for robust speech recognition. At first, we proposed a modified version of the multi-band spectral subtraction (MSS) method which adjusts the subtraction level of noise spectrum according to band-SNR. In the proposed method referred as M-MSS, a noise normalization factor was newly introduced to finely control the over-estimation factor depending on the band-SNR. Also, we modified the architecture of the stochastic feature extraction (SFE) method. We could get a better performance when the spectral subtraction was applied in the power spectrum domain than in the mel-scale domain. This method is denoted as M-SFE. Last, we applied the M-MSS method to the modified stochastic feature extraction structure, which is denoted as the MMSS-MSFE method. The proposed methods were evaluated on isolated word recognition under various noise environments. The average error rates of the M-MSS, M-SFE, and MMSS-MSFE methods over the ordinary spectral subtraction (SS) method were reduced by 18.6%, 15.1%, and 33.9%, respectively. From these results, we can conclude that the proposed methods provide good candidates for robust feature extraction in the noisy speech recognition.
PDF KSCI

Korean Phoneme Recognition Using Self-Organizing Feature Map (SOFM 신경회로망을 이용한 한국어 음소 인식)

Jeon, Yong-Koo;Yang, Jin-Woo;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.2
- /
- pp.101-112
- /
- 1995
In order to construct a feature map-based phoneme classification system for speech recognition, two procedures are usually required. One is clustering and the other is labeling. In this paper, we present a phoneme classification system based on the Kohonen's Self-Organizing Feature Map (SOFM) for clusterer and labeler. It is known that the SOFM performs self-organizing process by which optimal local topographical mapping of the signal space and yields a reasonably high accuracy in recognition tasks. Consequently, SOFM can effectively be applied to the recognition of phonemes. Besides to improve the performance of the phoneme classification system, we propose the learning algorithm combined with the classical K-mans clustering algorithm in fine-tuning stage. In order to evaluate the performance of the proposed phoneme classification algorithm, we first use totaly 43 phonemes which construct six intra-class feature maps for six different phoneme classes. From the speaker-dependent phoneme classification tests using these six feature maps, we obtain recognition rate of $87.2\%$ and confirm that the proposed algorithm is an efficient method for improvement of recognition performance and convergence speed.
PDF

Reducing latency of neural automatic piano transcription models (인공신경망 기반 저지연 피아노 채보 모델)

Dasol Lee;Dasaem Jeong
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.2
- /
- pp.102-111
- /
- 2023
Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.
https://doi.org/10.7776/ASK.2023.42.2.102 인용 PDF

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

Kim, Hyoung-Gook;Shin, Dong
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.2
- /
- pp.112-118
- /
- 2009
This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.
https://doi.org/10.7776/ASK.2009.28.2.112 인용 PDF KSCI

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

Shen, Guang-Hu;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.4
- /
- pp.278-285
- /
- 2010
The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.
https://doi.org/10.7776/ASK.2010.29.4.278 인용 PDF KSCI

Noise-Biased Compensation of Minimum Statistics Method using a Nonlinear Function and A Priori Speech Absence Probability for Speech Enhancement (음질향상을 위해 비선형 함수와 사전 음성부재확률을 이용한 최소통계법의 잡음전력편의 보상방법)

Lee, Soo-Jeong;Lee, Gang-Seong;Kim, Sun-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.1
- /
- pp.77-83
- /
- 2009
This paper proposes a new noise-biased compensation of minimum statistics(MS) method using a nonlinear function and a priori speech absence probability(SAP) for speech enhancement in non-stationary noisy environments. The minimum statistics(MS) method is well known technique for noise power estimation in non-stationary noisy environments. It tends to bias the noise estimate below that of true noise level. The proposed method is combined with an adaptive parameter based on a sigmoid function and a priori speech absence probability (SAP) for biased compensation. Specifically. we apply the adaptive parameter according to the a posteriori SNR. In addition, when the a priori SAP equals unity, the adaptive biased compensation factor separately increases ${\delta}_{max}$ each frequency bin, and vice versa. We evaluate the estimation of noise power capability in highly non-stationary and various noise environments, the improvement in the segmental signal-to-noise ratio (SNR), and the Itakura-Saito Distortion Measure (ISDM) integrated into a spectral subtraction (SS). The results shows that our proposed method is superior to the conventional MS approach.
https://doi.org/10.7776/ASK.2009.28.1.077 인용 PDF KSCI

Search Result 209, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)