Search | Korea Science

Pseudo-Cepstral Representation of Speech Signal and Its Application to Speech Recognition (음성 신호의 의사 켑스트럼 표현 및 음성 인식에의 응용)

Kim, Hong-Kook;Lee, Hwang-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.1E
- /
- pp.71-81
- /
- 1994
In this paper, we propose a pseudo-cepstral representation of line spectrum pair(LSP) frequencies and evaluate speech recognition performance with cepstral lift using the pseudo-cepstrum. The pseudo-cepstrum corresponding to LSP frequencies is derived by approxmating the relationship between LPC-cepstrum and LSP frequencies. Three cepstral liftering procedures are applied to the pseudo-cepstrum to improve the performance of speech recognition. They are the root-power-sums ligter, the general exponential lifter, and the bandpass lifter. Then, the liftered psedudo-cepstra are warped into a mel-frequency scale to obtain feature vectors for speech recognition. Among the three lifters, the general exponential lifter results in the best performance on speech recognition. When we use the proposed pseudo-cepstra feature vectors for recognizing noisy speech, the signal-to-noise ratio (SNR) improvement of about 5~10dB LSP is obtained.
PDF

A Study on the Evaluation of Simplification Algorithms Based on Map Generalization (지도 일반화에 따른 단순화 알고리즘의 평가에 관한 연구)

Kim, Kam-Lae;Lee, Ho-Nam;Park, In-Hae
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.10 no.2
- /
- pp.63-71
- /
- 1992
The digital map database is often produced for multiple purposes, including mapping at multiple scales; it is increasingly rare that a base map is digitized for mapping at a single scale. The most important problems in process of line simplification in map generalization and multiple representation is that tolerance value selected for simplifying base map information must be modified as feature geometry varies within the digital file to ensure both accuracy and recognizability of graphic details on a generalized map. In this study, we explored various algorithms regarding to line simplication at many scales from a single digital file, and presents a rule by which to determine those scale at which line feature geometry might be expected to change in map representation. By applying two measured of displacement between a digitized line and its simplification, five algorithms have been evaluated. The results indicate that, of the five, the Douglas-Peucker routine produced less displacement between a line and its simplification. The research has proved to automating map simplification, incorporating numeric guidelines into digital environment about what magnitude and variation in geometric detail should be preserved as the digital data is simplified for representation at reduced map scales.
PDF

Feature-Oriented Adaptive Motion Analysis For Recognizing Facial Expression (특징점 기반의 적응적 얼굴 움직임 분석을 통한 표정 인식)

Noh, Sung-Kyu;Park, Han-Hoon;Shin, Hong-Chang;Jin, Yoon-Jong;Park, Jong-Il
- 한국HCI학회:학술대회논문집
- /
- 2007.02a
- /
- pp.667-674
- /
- 2007
Facial expressions provide significant clues about one's emotional state; however, it always has been a great challenge for machine to recognize facial expressions effectively and reliably. In this paper, we report a method of feature-based adaptive motion energy analysis for recognizing facial expression. Our method optimizes the information gain heuristics of ID3 tree and introduces new approaches on (1) facial feature representation, (2) facial feature extraction, and (3) facial feature classification. We use minimal reasonable facial features, suggested by the information gain heuristics of ID3 tree, to represent the geometric face model. For the feature extraction, our method proceeds as follows. Features are first detected and then carefully "selected." Feature "selection" is finding the features with high variability for differentiating features with high variability from the ones with low variability, to effectively estimate the feature's motion pattern. For each facial feature, motion analysis is performed adaptively. That is, each facial feature's motion pattern (from the neutral face to the expressed face) is estimated based on its variability. After the feature extraction is done, the facial expression is classified using the ID3 tree (which is built from the 1728 possible facial expressions) and the test images from the JAFFE database. The proposed method excels and overcomes the problems aroused by previous methods. First of all, it is simple but effective. Our method effectively and reliably estimates the expressive facial features by differentiating features with high variability from the ones with low variability. Second, it is fast by avoiding complicated or time-consuming computations. Rather, it exploits few selected expressive features' motion energy values (acquired from intensity-based threshold). Lastly, our method gives reliable recognition rates with overall recognition rate of 77%. The effectiveness of the proposed method will be demonstrated from the experimental results.
PDF

Siamese Network for Learning Robust Feature of Hippocampi

Ahmed, Samsuddin;Jung, Ho Yub
- Smart Media Journal
- /
- v.9 no.3
- /
- pp.9-17
- /
- 2020
Hippocampus is a complex brain structure embedded deep into the temporal lobe. Studies have shown that this structure gets affected by neurological and psychiatric disorders and it is a significant landmark for diagnosing neurodegenerative diseases. Hippocampus features play very significant roles in region-of-interest based analysis for disease diagnosis and prognosis. In this study, we have attempted to learn the embeddings of this important biomarker. As conventional metric learning methods for feature embedding is known to lacking in capturing semantic similarity among the data under study, we have trained deep Siamese convolutional neural network for learning metric of the hippocampus. We have exploited Gwangju Alzheimer's and Related Dementia cohort data set in our study. The input to the network was pairs of three-view patches (TVPs) of size 32 × 32 × 3. The positive samples were taken from the vicinity of a specified landmark for the hippocampus and negative samples were taken from random locations of the brain excluding hippocampi regions. We have achieved 98.72% accuracy in verifying hippocampus TVPs.
https://doi.org/10.30693/SMJ.2020.9.3.9 인용 PDF KSCI

Chaotic Features for Dynamic Textures Recognition with Group Sparsity Representation

Luo, Xinbin;Fu, Shan;Wang, Yong
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.9 no.11
- /
- pp.4556-4572
- /
- 2015
Dynamic texture (DT) recognition is a challenging problem in numerous applications. In this study, we propose a new algorithm for DT recognition based on group sparsity structure in conjunction with chaotic feature vector. Bag-of-words model is used to represent each video as a histogram of the chaotic feature vector, which is proposed to capture self-similarity property of the pixel intensity series. The recognition problem is then cast to a group sparsity model, which can be efficiently optimized through alternating direction method of multiplier algorithm. Experimental results show that the proposed method exhibited the best performance among several well-known DT modeling techniques.
https://doi.org/10.3837/tiis.2015.11.017 인용 PDF KSCI KPUBS HTML

SEMANTIC FEATURE DETECTION FOR REAL-TIME IMAGE TRANSMISSION OF SIGN LANGUAGE AND FINGER SPELLING

Hou, Jin;Aoki, Yoshinao
- Proceedings of the IEEK Conference
- /
- 2002.07c
- /
- pp.1662-1665
- /
- 2002
This paper proposes a novel semantic feature detection (SFD) method for real-time image transmission of sign language and finger spelling. We extract semantic information as an interlingua from input text by natural language processing, and then transmit the semantic feature detection, which actually is a parameterized action representation, to the 3-D articulated humanoid models prepared in each client in remote locations. Once the SFD is received, the virtual human will be animated by the synthesized SFD. The experimental results based on Japanese sign langauge and Chinese sign langauge demonstrate that this algorithm is effective in real-time image delivery of sign language and finger spelling.
PDF

A Study on the 3D Reconstruction and Representation of CT Images (CT영상의 3차원 재구성 및 표현에 관한 연구)

한영환;이응혁
- Journal of Biomedical Engineering Research
- /
- v.15 no.2
- /
- pp.201-208
- /
- 1994
Many three-dimensional object modeling and display methods for computer graphics and computer vision have been developed. Recently, with the help of medical imaging devices such as computerized tomography, magnetic resonance image, etc., some of those object modeling and display methods have been widely used for capturing the shape, structure and other properties of real objects in many medical applications. In this paper, we propose the reconstruction and display method of the three-dimensional object from a series of the cross sectonal image. It is implemented by using the automatic threshold selection method and the contour following algorithm. The combination of curvature and distance, we select feature points. Those feature points are the candidates for the tiling method. As a results, it is proven that this proposed method is very effective and useful in the comprehension of the object's structure. Without the technician's responce, it can be automated.
PDF

Condition-invariant Place Recognition Using Deep Convolutional Auto-encoder (Deep Convolutional Auto-encoder를 이용한 환경 변화에 강인한 장소 인식)

Oh, Junghyun;Lee, Beomhee
- The Journal of Korea Robotics Society
- /
- v.14 no.1
- /
- pp.8-13
- /
- 2019
Visual place recognition is widely researched area in robotics, as it is one of the elemental requirements for autonomous navigation, simultaneous localization and mapping for mobile robots. However, place recognition in changing environment is a challenging problem since a same place look different according to the time, weather, and seasons. This paper presents a feature extraction method using a deep convolutional auto-encoder to recognize places under severe appearance changes. Given database and query image sequences from different environments, the convolutional auto-encoder is trained to predict the images of the desired environment. The training process is performed by minimizing the loss function between the predicted image and the desired image. After finishing the training process, the encoding part of the structure transforms an input image to a low dimensional latent representation, and it can be used as a condition-invariant feature for recognizing places in changing environment. Experiments were conducted to prove the effective of the proposed method, and the results showed that our method outperformed than existing methods.
https://doi.org/10.7746/jkros.2019.14.1.008 인용 PDF KSCI

Design and Implementation of Matching Engine for QbSH System Based on Polyphonic Music (다성음원 기반 QbSH 시스템을 위한 매칭엔진의 설계 및 구현)

Park, Sung-Joo;Chung, Kwang-Sue
- Journal of Korea Multimedia Society
- /
- v.15 no.1
- /
- pp.18-31
- /
- 2012
This paper proposes a matching engine of query-by-singing/humming (QbSH) system which retrieves the most similar music information by comparing the input data with the extracted feature information from polyphonic music like MP3. The feature sequences transcribed from polyphonic music may have many errors. So, to reduce the influence of errors and improve the performance, the chroma-scale representation, compensation and asymmetric DTW (Dynamic Time Warping) are adopted in the matching engine. The performance of various distance metrics are also investigated in this paper. In our experiment, the proposed QbSH system achieves MRR (Mean Reciprocal Rank) of 0.718 for 1000 singing/humming queries when searching from a database of 450 polyphonic musics.
https://doi.org/10.9717/kmms.2012.15.1.018 인용 PDF KSCI

A Word Embedding used Word Sense and Feature Mirror Model (단어 의미와 자질 거울 모델을 이용한 단어 임베딩)

Lee, JuSang;Shin, JoonChoul;Ock, CheolYoung
- KIISE Transactions on Computing Practices
- /
- v.23 no.4
- /
- pp.226-231
- /
- 2017
Word representation, an important area in natural language processing(NLP) used machine learning, is a method that represents a word not by text but by distinguishable symbol. Existing word embedding employed a large number of corpora to ensure that words are positioned nearby within text. However corpus-based word embedding needs several corpora because of the frequency of word occurrence and increased number of words. In this paper word embedding is done using dictionary definitions and semantic relationship information(hypernyms and antonyms). Words are trained using the feature mirror model(FMM), a modified Skip-Gram(Word2Vec). Sense similar words have similar vector. Furthermore, it was possible to distinguish vectors of antonym words.
https://doi.org/10.5626/KTCP.2017.23.4.226 인용 KSCI

Search Result 422, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)