Search | Korea Science

Cepstral Normalization Combined with CSFN for Noisy Speech Recognition (켑스트럼 정규화와 켑스트럼 거리기반 묵음특징정규화 방법을 이용한 잡음음성 인식)

Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
- Journal of Korea Multimedia Society
- /
- v.14 no.10
- /
- pp.1221-1228
- /
- 2011
The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.
https://doi.org/10.9717/kmms.2011.14.10.1221 인용 PDF KSCI

Auto-Analysis of Traffic Flow through Semantic Modeling of Moving Objects (움직임 객체의 의미적 모델링을 통한 차량 흐름 자동 분석)

Choi, Chang;Cho, Mi-Young;Choi, Jun-Ho;Choi, Dong-Jin;Kim, Pan-Koo
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.8 no.6
- /
- pp.36-45
- /
- 2009
Recently, there are interested in the automatic traffic flowing and accident detection using various low level information from video in the road. In this paper, the automatic traffic flowing and algorithm, and application of traffic accident detection using traffic management systems are studied. To achieve these purposes, the spatio-temporal relation models using topological and directional relations have been made, then a matching of the proposed models with the directional motion verbs proposed by Levin's verbs of inherently directed motion is applied. Finally, the synonym and antonym are inserted by using WordNet. For the similarity measuring between proposed modeling and trajectory of moving object in the video, the objects are extracted, and then compared with the trajectories of moving objects by the proposed modeling. Because of the different features with each proposed modeling, the rules that have been generated will be applied to the similarity measurement by TSR (Tangent Space Representation). Through this research, we can extend our results to the automatic accident detection of vehicle using CCTV.
PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

신옥근
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.4
- /
- pp.299-307
- /
- 2003
There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.
PDF KSCI

Item Trend Analysis Considering Social Network Data in Online Shopping Malls (온라인 쇼핑몰에서 소셜 네트워크 데이터를 고려한 상품 트렌드 분석)

Park, Soobin;Choi, Dojin;Yoo, Jaesoo;Bok, Kyoungsoo
- The Journal of the Korea Contents Association
- /
- v.20 no.2
- /
- pp.96-104
- /
- 2020
As consumers' consumption activities become more active due to the activation of online shopping malls, companies are conducting item trend analyses to boost sales. The existing item trend analysis methods are analyzed by considering only the activities of users in online shopping mall services, making it difficult to identify trends for new items without purchasing history. In this paper, we propose a trend analysis method that combines data in online shopping mall services and social network data to analyze item trends in users and potential customers in shopping malls. The proposed method uses the user's activity logs for in-service data and utilizes hot topics through word set extraction from social network data set to reflect potential users' interests. Finally, the item trend change is detected over time by utilizing the item index and the number of mentions in the social network. We show the superiority of the proposed method through performance evaluations using social network data.
https://doi.org/10.5392/JKCA.2020.20.02.096 인용 PDF KSCI HTML

Constructing Negative Links from Multi-facet of Social Media

Li, Lin;Yan, YunYi;Jia, LiBin;Ma, Jun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.5
- /
- pp.2484-2498
- /
- 2017
Various types of social media make the people share their personal experience in different ways. In some social networking sites. Some users post their reviews, some users can support these reviews with comments, and some users just rate the reviews as kind of support or not. Unfortunately, there is rare explicit negative comments towards other reviews. This means if there is a link between two users, it must be positive link. Apparently, the negative link is invisible in these social network. Or in other word, the negative links are redundant to positive links. In this work, we first discuss the feature extraction from social media data and propose new method to compute the distance between each pair of comments or reviews on social media. Then we investigate whether we can predict negative links via regression analysis when only positive links are manifested from social media data. In particular, we provide a principled way to mathematically incorporate multi-facet data in a novel framework, Constructing Negative Links, CsNL to predict negative links for discovering the hidden information. Additionally, we investigate the ways of solution to general negative link predication problems with CsNL and its extension. Experiments are performed on real-world data and results show that negative links is predictable with multi-facet of social media data by the proposed framework CsNL. Essentially, high prediction accuracy suggests that negative links are redundant to positive links. Further experiments are performed to evaluate coefficients on different kernels. The results show that user generated content dominates the prediction performance of CsNL.
https://doi.org/10.3837/tiis.2017.05.010 인용 PDF KSCI

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
- Journal of Digital Convergence
- /
- v.14 no.8
- /
- pp.233-243
- /
- 2016
Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.
https://doi.org/10.14400/JDC.2016.14.8.233 인용 PDF KSCI

Phoneme-Boundary-Detection and Phoneme Recognition Research using Neural Network (음소경계검출과 신경망을 이용한 음소인식 연구)

임유두;강민구;최영호
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 1999.11a
- /
- pp.224-229
- /
- 1999
In the field of speech recognition, the research area can be classified into the following two categories: one which is concerned with the development of phoneme-level recognition system, the other with the efficiency of word-level recognition system. The resonable phoneme-level recognition system should detect the phonemic boundaries appropriately and have the improved recognition abilities all the more. The traditional LPC methods detect the phoneme boundaries using Itakura-Saito method which measures the distance between LPC of the standard phoneme data and that of the target speech frame. The MFCC methods which treat spectral transitions as the phonemic boundaries show the lack of adaptability. In this paper, we present new speech recognition system which uses auto-correlation method in the phonemic boundary detection process and the multi-layered Feed-Forward neural network in the recognition process respectively. The proposed system outperforms the traditional methods in the sense of adaptability and another advantage of the proposed system is that feature-extraction part is independent of the recognition process. The results show that frame-unit phonemic recognition system should be possibly implemented.
PDF

Extracting Core Events Based on Timeline and Retweet Analysis in Twitter Corpus (트위터 문서에서 시간 및 리트윗 분석을 통한 핵심 사건 추출)

Tsolmon, Bayar;Lee, Kyung-Soon
- KIPS Transactions on Software and Data Engineering
- /
- v.1 no.1
- /
- pp.69-74
- /
- 2012
Many internet users attempt to focus on the issues which have posted on social network services in a very short time. When some social big issue or event occurred, it will affect the number of comments and retweet on that day in twitter. In this paper, we propose the method of extracting core events based on timeline analysis, sentiment feature and retweet information in twitter data. To validate our method, we have compared the methods using only the frequency of words, word frequency with sentiment analysis, using only chi-square method and using sentiment analysis with chi-square method. For justification of the proposed approach, we have evaluated accuracy of correct answers in top 10 results. The proposed method achieved 94.9% performance. The experimental results show that the proposed method is effective for extracting core events in twitter corpus.
https://doi.org/10.3745/KTSDE.2012.1.1.069 인용 PDF

How to Express Emotion: Role of Prosody and Voice Quality Parameters (감정 표현 방법: 운율과 음질의 역할)

Lee, Sang-Min;Lee, Ho-Joon
- Journal of the Korea Society of Computer and Information
- /
- v.19 no.11
- /
- pp.159-166
- /
- 2014
In this paper, we examine the role of emotional acoustic cues including both prosody and voice quality parameters for the modification of a word sense. For the extraction of prosody parameters and voice quality parameters, we used 60 pieces of speech data spoken by six speakers with five different emotional states. We analyzed eight different emotional acoustic cues, and used a discriminant analysis technique in order to find the dominant sequence of acoustic cues. As a result, we found that anger has a close relation with intensity level and 2nd formant bandwidth range; joy has a relative relation with the position of 2nd and 3rd formant values and intensity level; sadness has a strong relation only with prosody cues such as intensity level and pitch level; and fear has a relation with pitch level and 2nd formant value with its bandwidth range. These findings can be used as the guideline for find-tuning an emotional spoken language generation system, because these distinct sequences of acoustic cues reveal the subtle characteristics of each emotional state.
https://doi.org/10.9708/jksci.2014.19.11.159 인용 PDF KSCI

A Study on the Research Trend in the Dyslexia and Learning Disability Trough a Keyword Network Analysis (키워드 네트워크 분석을 통한 난독증과 학습장애 관련 연구 동향 분석)

Lee, Woo-Jin;Kim, Tae-Gang
- Journal of Digital Convergence
- /
- v.17 no.1
- /
- pp.91-98
- /
- 2019
The present study was performed to investigate the general research trends of dyslexia and learning disability to explore the centrality of related variables though analysis of keyword networks. Data were collected from ten years articles research information sharing service(RISS) which is provided by korea education and research information service(KERIS). The research subjects selected for the analysis were keyword cleansing work, extraction major keyword using KrKwic program and using NodeXL program to Visualize the center of connection between keyword. The results of this were as follows. First, totally 72 of keyword were extracted from keyword cleansing process and among those keyword. major keywords included learning disability, dyslexia, RTI. Second, analysis of the betweenness centrality of dyslexia and learing disabilities shows that learning disabilities are a key word that has been addressed in the study of dyslexia and learning disabilities in korea. The results of these studies suggest a method of analyzing trends in qualitative and qualitative analysis in relation to dyslexia and learning disorder.
https://doi.org/10.14400/JDC.2019.17.1.091 인용 PDF KSCI HTML

Search Result 233, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)