Search | Korea Science

Korean Word Recognition Using Diphone- Level Hidden Markov Model (Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식)

Park, Hyun-Sang;Un, Chong-Kwan;Park, Yong-Kyu;Kwon, Oh-Wook
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.1
- /
- pp.14-23
- /
- 1994
In this paper, speech units appropriate for recognition of Korean language have been studied. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional legions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 7 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by two-state HMM's and stationary phoneme regions by one-state HMM's excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from $93.98\%$ to $96.29\%$. In addition, a local interpolation technique was used to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of $97.22\%$ after merging some diphone units.
PDF

Speaker-adaptive Word Recognition Using Mapped Membership Function (사상멤버쉽함수에 의한 화자적응 단어인식)

Lee, Ki-Yeong;Choi, Kap-Seok
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.40-52
- /
- 1992
In this paper, we propose the speaker adaptive word recognition method using a mapped membership function, in order to absorb a fluctuation owing to personal difference which is a problem of speaker independent speech recognition. In the training procedure of this method, the mapped membership function is made with the fuzzy theory introducded into a mapped codebook, between an unknown speaker's spectrum pattern and a standard speaker's one. In the recognition procedure, an input pattern of an unknown speaker is reconstructed to the pattern which is adapted to that of a standard speaker by the mapped membership function. To show the validity of this method, word recognition experiments are carried out using 28 DDD area names. The recognition rate of the conventional speaker-adaptive method using a mapped codebook by VQ is 64.9[%], and that made by a fuzzy VQ is 76.2[%]. Throughout the experiment using a mapped membership function, we can achieve 95.4[%] recognition rate. This shows that our proposed method is more excellent in recognition performance. Moreover, this method doesn't need an iterative training procedure to make the mapped membership function, and memory capacity and computation requirements for this method are reduced to 1/30 and 1/500 time of those for the conventional method using a mapped codebook, respectively.
PDF

Hand Gesture Recognition using Multivariate Fuzzy Decision Tree and User Adaptation (다변량 퍼지 의사결정트리와 사용자 적응을 이용한 손동작 인식)

Jeon, Moon-Jin;Do, Jun-Hyeong;Lee, Sang-Wan;Park, Kwang-Hyun;Bien, Zeung-Nam
- The Journal of Korea Robotics Society
- /
- v.3 no.2
- /
- pp.81-90
- /
- 2008
While increasing demand of the service for the disabled and the elderly people, assistive technologies have been developed rapidly. The natural signal of human such as voice or gesture has been applied to the system for assisting the disabled and the elderly people. As an example of such kind of human robot interface, the Soft Remote Control System has been developed by HWRS-ERC in $KAIST^[1]$. This system is a vision-based hand gesture recognition system for controlling home appliances such as television, lamp and curtain. One of the most important technologies of the system is the hand gesture recognition algorithm. The frequently occurred problems which lower the recognition rate of hand gesture are inter-person variation and intra-person variation. Intra-person variation can be handled by inducing fuzzy concept. In this paper, we propose multivariate fuzzy decision tree(MFDT) learning and classification algorithm for hand motion recognition. To recognize hand gesture of a new user, the most proper recognition model among several well trained models is selected using model selection algorithm and incrementally adapted to the user's hand gesture. For the general performance of MFDT as a classifier, we show classification rate using the benchmark data of the UCI repository. For the performance of hand gesture recognition, we tested using hand gesture data which is collected from 10 people for 15 days. The experimental results show that the classification and user adaptation performance of proposed algorithm is better than general fuzzy decision tree.
PDF

On-Line Korean Character Recognition by the Stroke Information of Korean Phoneme in Multimedia Terminal (한글 자소의 획 정보에 의한 멀티미디어 단말기에서의 온라인 한글 문자 인식)

Oh Juntaek;Jung Momoon;Lee Woobeom;Kim Wookhyun
- Journal of the Institute of Convergence Signal Processing
- /
- v.1 no.1
- /
- pp.64-73
- /
- 2000
The Korean character recognition technology for user interface in multimedia terminal requires fast processing time and high recognition rate. In this paper, we propose an phoneme and character recognition technology which uses characteristic information of korean and features of input strokes, i.e, feature point, feature vector, virtual vector, position relation between strokes. And, a recognition both phoneme and character by the various writing types of users uses korean database. The Korean database has been constructed by the characteristic information of korean and phoneme models which have various stroke information. Also, we use successive processing by the position relation between strokes and backtracking processing by the modification processing of stroke numbers which composed of each phoneme. This method reduces the complex processing of phoneme separation. The proposed on-line korean character recognition system has obtained 13msec average character processing time and correct recognition rate more than $95{\%}$ In a recognition experiment, where we tested 600 characters written by 10 people among 1,200 words.
PDF

Traffic Sign Recognition by the Variant-Compensation and Circular Tracing (변형 보정과 원형 추적법에 의한 교통 표지판 인식)

Lee, Woo-Beom
- Journal of the Institute of Convergence Signal Processing
- /
- v.9 no.3
- /
- pp.188-194
- /
- 2008
We propose the new method for the traffic signs recognition that is one of the DAS(Driving assistance system) in the intelligent vehicle. Our approach estimates a varied degree by using a geometric method from the varied traffic signs in noise, rotation and size, and extracts the recognition symbol from the compensated traffic sign for a recognition by using the sequential color-based clustering. This proposed clustering method classify the traffic sign into the attention, regulation, indication, and auxiliary class. Also, The circular tracing method is used for the final traffic sign recognition. To evaluate the effectiveness of the proposed method, varied traffic signs were built. As a result, The proposed method show that the 95 % recognition rate for a single variation, and 93 % recognition rate for a mixed variation.
PDF

A Study on the Realization of Wireless Home Network System Using High-performance Speech Recognition in Variable Position (가변위치 고음성인식 기술을 이용한 무선 홈 네트워크 시스템 구현에 관한 연구)

Yoon, Jun-Chul;Choi, Sang-Bang;Park, Chan-Sub;Kim, Se-Yong;Kim, Ki-Man;Kang, Suk-Youb
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.4
- /
- pp.991-998
- /
- 2010
In realization of wireless home network system using speech recognition in indoor voice recognition environment, background noise and reverberation are two main causes of digression in voice recognition system. In this study, the home network system resistant to reverberation and background noise using voice section detection method based on spectral entropy in indoor recognition environment is to be realized. Spectral subtraction can reduce the effect of reverberation and remove noise independent from voice signal by eliminating signal distorted by reverberation in spectrum. For effective spectral subtraction, the correct separation of voice section and silent section should be accompanied and for this, improvement of performance needs to be done, applying to voice section detection method based on entropy. In this study, experimental and indoor environment testing is carried out to figure out command recognition rate in indoor recognition environment. The test result shows that command recognition rate improved in static environment and reverberant room condition, using voice section detection method based on spectral entropy.
https://doi.org/10.6109/jkiice.2010.14.4.991 인용 PDF KSCI

Performance Analysis of Face Recognition by Face Image resolutions using CNN without Backpropergation and LDA (역전파가 제거된 CNN과 LDA를 이용한 얼굴 영상 해상도별 얼굴 인식률 분석)

Moon, Hae-Min;Park, Jin-Won;Pan, Sung Bum
- Smart Media Journal
- /
- v.5 no.1
- /
- pp.24-29
- /
- 2016
To satisfy the needs of high-level intelligent surveillance system, it shall be able to extract objects and classify to identify precise information on the object. The representative method to identify one's identity is face recognition that is caused a change in the recognition rate according to environmental factors such as illumination, background and angle of camera. In this paper, we analyze the robust face recognition of face image by changing the distance through a variety of experiments. The experiment was conducted by real face images of 1m to 5m. The method of face recognition based on Linear Discriminant Analysis show the best performance in average 75.4% when a large number of face images per one person is used for training. However, face recognition based on Convolution Neural Network show the best performance in average 69.8% when the number of face images per one person is less than five. In addition, rate of low resolution face recognition decrease rapidly when the size of the face image is smaller than $15{\times}15$.
PDF KSCI

Recognition Time Reduction Technique for the Time-synchronous Viterbi Beam Search (시간 동기 비터비 빔 탐색을 위한 인식 시간 감축법)

이강성
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.6
- /
- pp.46-50
- /
- 2001
This paper proposes a new recognition time reduction algorithm Score-Cache technique, which is applicable to the HMM-base speech recognition system. Score-Cache is a very unique technique that has no other performance degradation and still reduces a lot of search time. Other search reduction techniques have trade-offs with the recognition rate. This technique can be applied to the continuous speech recognition system as well as the isolated word speech recognition system. W9 can get high degree of recognition time reduction by only replacing the score calculating function, not changing my architecture of the system. This technique also can be used with other recognition time reduction algorithms which give more time reduction. We could get 54% of time reduction at best.
PDF

A Study of Energy Parameter without Windowing Influence in Speech Signal (윈도우의 영향이 제거된 에너지 파라미터에 관한 연구)

조태수;신동성;배명진
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.277-280
- /
- 2001
The preprocessing is very important course in speech signal processing. It influence the compression-rate in speech coding and the recognition-rate in speech recognition etc. In this paper, we propose that minimizing window-influence method with pitch period and start points. The proposed method is available for voiced detection and word labeling.
PDF

A Study on the Endpoint Detection Algorithm (끝점 검출 알고리즘에 관한 연구)

양진우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1984.12a
- /
- pp.66-69
- /
- 1984
This paper is a study on the Endpoint Detection for Korean Speech Recognition. In speech signal process, analysis parameter was classification from Zero Crossing Rate(Z.C.R), Log Energy(L.E), Energy in the predictive error(Ep) and fundamental Korean Speech digits, /영/-/구/ are selected as date for the Recognition of Speech. The main goal of this paper is to develop techniques and system for Speech input ot machine. In order to detect the Endpoint, this paper makes choice of Log Energy(L.E) from various parameters analysis, and the Log Energy is very effective parameter in classifying speech and nonspeech segments. The error rate of 1.43% result from the analysis.
PDF

Search Result 2,809, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)