Search | Korea Science

A study on the Method of the Keyword Spotting Recognition in the Continuous speech using Neural Network (신경 회로망을 이용한 연속 음성에서의 keyword spotting 인식 방식에 관한 연구)

Yang, Jin-Woo;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.4
- /
- pp.43-49
- /
- 1996
This research proposes a system for speaker independent Korean continuous speech recognition with 247 DDD area names using keyword spotting technique. The applied recognition algorithm is the Dynamic Programming Neural Network(DPNN) based on the integration of DP and multi-layer perceptron as model that solves time axis distortion and spectral pattern variation in the speech. To improve performance, we classify word model into keyword model and non-keyword model. We make an experiment on postprocessing procedure for the evaluation of system performance. Experiment results are as follows. The recognition rate of the isolated word is 93.45% in speaker dependent case. The recognition rate of the isolated word is 84.05% in speaker independent case. The recognition rate of simple dialogic sentence in keyword spotting experiment is 77.34% as speaker dependent, and 70.63% as speaker independent.
PDF

Application of Texture Feature Analysis Algorithm used the Statistical Characteristics in the Computed Tomography (CT): A base on the Hepatocellular Carcinoma (HCC) (전산화단층촬영 영상에서 통계적 특징을 이용한 질감특징분석 알고리즘의 적용: 간세포암 중심으로)

Yoo, Jueun;Jun, Taesung;Kwon, Jina;Jeong, Juyoung;Im, Inchul;Lee, Jaeseung;Park, Hyonghu;Kwak, Byungjoon;Yu, Yunsik
- Journal of the Korean Society of Radiology
- /
- v.7 no.1
- /
- pp.9-15
- /
- 2013
In this study, texture feature analysis (TFA) algorithm to automatic recognition of liver disease suggests by utilizing computed tomography (CT), by applying the algorithm computer-aided diagnosis (CAD) of hepatocellular carcinoma (HCC) design. Proposed the performance of each algorithm was to comparison and evaluation. In the HCC image, set up region of analysis (ROA, window size was $40{\times}40$ pixels) and by calculating the figures for TFA algorithm of the six parameters (average gray level, average contrast, measure of smoothness, skewness, measure of uniformity, entropy) HCC recognition rate were calculated. As a result, TFA was found to be significant as a measure of HCC recognition rate. Measure of uniformity was the most recognition. Average contrast, measure of smoothness, and skewness were relatively high, and average gray level, entropy showed a relatively low recognition rate of the parameters. In this regard, showed high recognition algorithms (a maximum of 97.14%, a minimum of 82.86%) use the determining HCC imaging lesions and assist early diagnosis of clinic. If this use to therapy, the diagnostic efficiency of clinical early diagnosis better than before. Later, after add the effective and quantitative analysis, criteria research for generalized of disease recognition is needed to be considered.
https://doi.org/10.7742/jksr.2013.7.1.009 인용 PDF KSCI

Recognition method using stereo images-based 3D information for improvement of face recognition (얼굴인식의 향상을 위한 스테레오 영상기반의 3차원 정보를 이용한 인식)

Park Chang-Han;Paik Joon-Ki
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.3 s.309
- /
- pp.30-38
- /
- 2006
In this paper, we improved to drops recognition rate according to distance using distance and depth information with 3D from stereo face images. A monocular face image has problem to drops recognition rate by uncertainty information such as distance of an object, size, moving, rotation, and depth. Also, if image information was not acquired such as rotation, illumination, and pose change for recognition, it has a very many fault. So, we wish to solve such problem. Proposed method consists of an eyes detection algorithm, analysis a pose of face, md principal component analysis (PCA). We also convert the YCbCr space from the RGB for detect with fast face in a limited region. We create multi-layered relative intensity map in face candidate region and decide whether it is face from facial geometry. It can acquire the depth information of distance, eyes, and mouth in stereo face images. Proposed method detects face according to scale, moving, and rotation by using distance and depth. We train by using PCA the detected left face and estimated direction difference. Simulation results with face recognition rate of 95.83% (100cm) in the front and 98.3% with the pose change were obtained successfully. Therefore, proposed method can be used to obtain high recognition rate with an appropriate scaling and pose change according to the distance.
PDF KSCI

Performance Analysis of Face Recognition by Distance according to Image Normalization and Face Recognition Algorithm (영상 정규화 및 얼굴인식 알고리즘에 따른 거리별 얼굴인식 성능 분석)

Moon, Hae-Min;Pan, Sung Bum
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.23 no.4
- /
- pp.737-742
- /
- 2013
The surveillance system has been developed to be intelligent which can judge and cope by itself using human recognition technique. The existing face recognition is excellent at a short distance but recognition rate is reduced at a long distance. In this paper, we analyze the performance of face recognition according to interpolation and face recognition algorithm in face recognition using the multiple distance face images to training. we use the nearest neighbor, bilinear, bicubic, Lanczos3 interpolations to interpolate face image and PCA and LDA to face recognition. The experimental results show that LDA-based face recognition with bilinear interpolation provides performance in face recognition.
https://doi.org/10.13089/JKIISC.2013.23.4.737 인용 PDF KSCI HTML

Design and Implementation of a Bimodal User Recognition System using Face and Audio (얼굴과 음성 정보를 이용한 바이모달 사용자 인식 시스템 설계 및 구현)

Kim Myung-Hun;Lee Chi-Geun;So In-Mi;Jung Sung-Tae
- Journal of the Korea Society of Computer and Information
- /
- v.10 no.5 s.37
- /
- pp.353-362
- /
- 2005
Recently, study of Bimodal recognition has become very active. In this paper we propose a Bimodal user recognition system that uses face information and audio information. Face recognition consists of face detection step and face recognition step. Face detection uses AdaBoost to find face candidate area. After finding face candidates, PCA feature extraction is applied to decrease the dimension of feature vector. And then, SVM classifiers are used to detect and recognize face. Audio recognition uses MFCC for audio feature extraction and HMM is used for audio recognition. Experimental results show that the Bimodal recognition can improve the user recognition rate much more than audio only recognition, especially in the Presence of noise.
PDF

Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language

Hosung Park;Changmin Kim;Hyunsoo Son;Soonshin Seo;Ji-Hwan Kim
- Journal of Web Engineering
- /
- v.21 no.2
- /
- pp.265-284
- /
- 2021
In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.
https://doi.org/10.13052/jwe1540-9589.2126 인용

Half-Against-Half Multi-class SVM Classify Physiological Response-based Emotion Recognition

Vanny, Makara;Ko, Kwang-Eun;Park, Seung-Min;Sim, Kwee-Bo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.23 no.3
- /
- pp.262-267
- /
- 2013
The recognition of human emotional state is one of the most important components for efficient human-human and human- computer interaction. In this paper, four emotions such as fear, disgust, joy, and neutral was a main problem of classifying emotion recognition and an approach of visual-stimuli for eliciting emotion based on physiological signals of skin conductance (SC), skin temperature (SKT), and blood volume pulse (BVP) was used to design the experiment. In order to reach the goal of solving this problem, half-against-half (HAH) multi-class support vector machine (SVM) with Gaussian radial basis function (RBF) kernel was proposed showing the effective techniques to improve the accuracy rate of emotion classification. The experimental results proved that the proposed was an efficient method for solving the emotion recognition problems with the accuracy rate of 90% of neutral, 86.67% of joy, 85% of disgust, and 80% of fear.
https://doi.org/10.5391/JKIIS.2013.23.3.262 인용 PDF KSCI

Improvement of Speech Recognition Performance in Running Car by Considering Wind Noise (바람잡음을 고려한 자동차에서의 음성인식 성능 향상)

Lee, Ki-Hoon;Lee, Chul-Hee;Kim, Chong-Kyo
- Proceedings of the KSPS conference
- /
- 2004.05a
- /
- pp.231-234
- /
- 2004
This paper describes an efficient method for improving the noise-robustness in speech recognition in a running car by considering wind noise. In driving car, mainly three kind of noises engine noise, tire noise and wind noise, are severely affect recognition performance. Especially wind noise is an important factor in driving car with window opened. We analyzed wind noise in various driving conditions that are 60, 80, 100 km/h with window fully opened, window half opened. We clarified that the recognition rate is significantly degenerated when the wind noise components in the frequency range above 200 Hz are large. We developed a preprocessing method to improve the noise robustness despite of wind noise. We adaptively changed the cutoff frequency of the front-end high-pass filter from 100 through 200 Hz according to the level of the wind noise components. By this method, the recognition rate is considerably improved for all kind of driving conditions
PDF

Handwritten Numerals Recognition Using an Ant-Miner Algorithm

Phokharatkul, Pisit;Phaiboon, Supachai
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.1031-1033
- /
- 2005
This paper presents a system of handwritten numerals recognition, which is based on Ant-miner algorithm (data mining based on Ant colony optimization). At the beginning, three distinct fractures (also called attributes) of each numeral are extracted. The attributes are Loop zones, End points, and Feature codes. After these data are extracted, the attributes are in the form of attribute = value (eg. End point10 = true). The extraction is started by dividing the numeral into 12 zones. The numbers 1-12 are referenced for each zone. The possible values of Loop zone attribute in each zone are "true" and "false". The meaning of "true" is that the zone contains the loop of the numeral. The Endpoint attribute being "true" means that this zone contains the end point of the numeral. There are 24 attributes now. The Feature code attribute tells us how many lines of a numeral are passed by the referenced line. There are 7 referenced lines used in this experiment. The total attributes are 31. All attributes are used for construction of the classification rules by the Ant-miner algorithm in order to classify 10 numerals. The Ant-miner algorithm is adapted with a little change in this experiment for a better recognition rate. The results showed the system can recognize all of the training set (a thousand items of data from 50 people). When the unseen data is tested from 10 people, the recognition rate is 98 %.
PDF

A Spectral Compensation Method for Noise Robust Speech Recognition (잡음에 강인한 음성인식을 위한 스펙트럼 보상 방법)

Cho, Jung-Ho
- 전자공학회논문지 IE
- /
- v.49 no.2
- /
- pp.9-17
- /
- 2012
One of the problems on the application of the speech recognition system in the real world is the degradation of the performance by acoustical distortions. The most important source of acoustical distortion is the additive noise. This paper describes a spectral compensation technique based on a spectral peak enhancement scheme followed by an efficient noise subtraction scheme for noise robust speech recognition. The proposed methods emphasize the formant structure and compensate the spectral tilt of the speech spectrum while maintaining broad-bandwidth spectral components. The recognition experiments was conducted using noisy speech corrupted by white Gaussian noise, car noise, babble noise or subway noise. The new technique reduced the average error rate slightly under high SNR(Signal to Noise Ratio) environment, and significantly reduced the average error rate by 1/2 under low SNR(10 dB) environment when compared with the case of without spectral compensations.
PDF KSCI

Search Result 2,809, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)