• Title/Summary/Keyword: language recognition

Search Result 971, Processing Time 0.025 seconds

On-line dynamic hand gesture recognition system for the korean sign language (KSL) (한글 수화용 동적 손 제스처의 실시간 인식 시스템의 구현에 관한 연구)

  • Kim, Jong-Sung;Lee, Chan-Su;Jang, Won;Bien, Zeungnam
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.2
    • /
    • pp.61-70
    • /
    • 1997
  • Human-hand gestures have been used a means of communication among people for a long time, being interpreted as streams of tokens for a language. The signed language is a method of communication for hearing impaired person. Articulated gestures and postures of hands and fingers are commonly used for the signed language. This paper presents a system which recognizes the korean sign language (KSL) and translates the recognition results into a normal korean text and sound. A pair of data-gloves are used a sthe sensing device for detecting motions of hands and fingers. In this paper, we propose a dynamic gesture recognition mehtod by employing a fuzzy feature analysis method for efficient classification of hand motions, and applying a fuzzy min-max neural network to on-line pattern recognition.

  • PDF

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation

  • Byung Ok Kang;Hyung-Bae Jeon;Yun Kyung Lee
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.48-58
    • /
    • 2024
  • This paper presents the development of language tutoring systems for nonnative speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI Peng-Talk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.

A Language Model based on VCCV of Sentence Speech Recognition (문장 음성 인식을 위한 VCCV기반의 언어 모델)

  • 박선희;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2419-2422
    • /
    • 2003
  • To improve performance of sentence speech recognition systems, we need to consider perplexity of language model and the number of words of dictionary for increasing vocabulary size. In this paper, we propose a language model of VCCV units for sentence speech recognition. For this, we choose VCCV units as a processing units of language model and compare it with clauses and morphemes. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have small lexicon size and limited vocabulary. An advantage of VCCV units is low perplexity. This paper made language model using bigram about given text. We calculated perplexity of each language processing unit. The perplexity of VCCV units is lower than morpheme and clause.

  • PDF

Voice Recognition Softwares: Their implications to second language teaching, learning, and research

  • Park, Chong-won
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.69-85
    • /
    • 2000
  • Recently, Computer Assisted Language Learning (CALL) received widely held attention from diverse audiences. However, to the author's knowledge, relatively little attention was paid to the educational implications of voice recognition (VR) softwares in language teaching in general, and teaching and learning pronunciation in particular. This study explores, and extends the applicability of VR softwares toward second language research areas addressing how VR softwares might facilitate interview data entering processes. To aid the readers' understanding in this field, the background of classroom interaction research, and the rationale of why interview data, therefore the role of VR softwares, becomes critical in this realm of inquiry will be discussed. VR softwares' development and a brief report on the features of up-to-date VR softwares will be sketched. Finally, suggestions for future studies investigating the impact of VR softwares on second language learning, teaching, and research will be offered.

  • PDF

Real-time Sign Language Recognition Using an Armband with EMG and IMU Sensors (근전도와 관성센서가 내장된 암밴드를 이용한 실시간 수화 인식)

  • Kim, Seongjung;Lee, Hansoo;Kim, Jongman;Ahn, Soonjae;Kim, Youngho
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.10 no.4
    • /
    • pp.329-336
    • /
    • 2016
  • Deaf people using sign language are experiencing social inequalities and financial losses due to communication restrictions. In this paper, real-time pattern recognition algorithm was applied to distinguish American Sign Language using an armband sensor(8-channel EMG sensors and one IMU) to enable communication between the deaf and the hearing people. The validation test was carried out with 11 people. Learning pattern classifier was established by gradually increasing the number of training database. Results showed that the recognition accuracy was over 97% with 20 training samples and over 99% with 30 training samples. The present study shows that sign language recognition using armband sensor is more convenient and well-performed.

Sign language translation using video captioning and sign language recognition using action recognition (비디오 캡셔닝을 적용한 수어 번역 및 행동 인식을 적용한 수어 인식)

  • Gi-Duk Kim;Geun-Hoo Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.317-319
    • /
    • 2024
  • 본 논문에서는 비디오 캡셔닝 알고리즘을 적용한 수어 번역 및 행동 인식 알고리즘을 적용한 수어 인식 알고리즘을 제안한다. 본 논문에 사용된 비디오 캡셔닝 알고리즘으로 40개의 연속된 입력 데이터 프레임을 CNN 네트워크를 통해 임베딩 하고 트랜스포머의 입력으로 하여 문장을 출력하였다. 행동 인식 알고리즘은 랜덤 샘플링을 하여 한 영상에 40개의 인덱스에서 40개의 연속된 데이터에 CNN 네트워크를 통해 임베딩하고 GRU, 트랜스포머를 결합한 RNN 모델을 통해 인식 결과를 출력하였다. 수어 번역에서 BLEU-4의 경우 7.85, CIDEr는 53.12를 얻었고 수어 인식으로 96.26%의 인식 정확도를 얻었다.

  • PDF

Implementation of Real-time Recognition System for Korean Sign Language (한글 수화의 실시간 인식 시스템의 구현)

  • Han Young-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.4
    • /
    • pp.85-93
    • /
    • 2005
  • In this paper, we propose recognition system which tracks the unmarked hand of a person performing sign language in complex background. First of all, we measure entropy for the difference image between continuous frames. Using a color information that is similar to a skin color in candidate region which has high value, we extract hand region only from background image. On the extracted hand region, we detect a contour and recognize sign language by applying improved centroidal profile method. In the experimental results for 6 kinds of sing language movement, unlike existing methods, we can stably recognize sign language in complex background and illumination changes without marker. Also, it shows the recognition rate with more than 95% for person and $90\sim100%$ for each movement at 15 frames/second.

  • PDF

DeNERT: Named Entity Recognition Model using DQN and BERT

  • Yang, Sung-Min;Jeong, Ok-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.29-35
    • /
    • 2020
  • In this paper, we propose a new structured entity recognition DeNERT model. Recently, the field of natural language processing has been actively researched using pre-trained language representation models with a large amount of corpus. In particular, the named entity recognition, which is one of the fields of natural language processing, uses a supervised learning method, which requires a large amount of training dataset and computation. Reinforcement learning is a method that learns through trial and error experience without initial data and is closer to the process of human learning than other machine learning methodologies and is not much applied to the field of natural language processing yet. It is often used in simulation environments such as Atari games and AlphaGo. BERT is a general-purpose language model developed by Google that is pre-trained on large corpus and computational quantities. Recently, it is a language model that shows high performance in the field of natural language processing research and shows high accuracy in many downstream tasks of natural language processing. In this paper, we propose a new named entity recognition DeNERT model using two deep learning models, DQN and BERT. The proposed model is trained by creating a learning environment of reinforcement learning model based on language expression which is the advantage of the general language model. The DeNERT model trained in this way is a faster inference time and higher performance model with a small amount of training dataset. Also, we validate the performance of our model's named entity recognition performance through experiments.

Speech Recognition Interface in the Communication Environment (통신환경에서 음성인식 인터페이스)

  • Han, Tai-Kun;Kim, Jong-Keun;Lee, Dong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2001.07d
    • /
    • pp.2610-2612
    • /
    • 2001
  • This study examines the recognition of the user's sound command based on speech recognition and natural language processing, and develops the natural language interface agent which can analyze the recognized command. The natural language interface agent consists of speech recognizer and semantic interpreter. Speech recognizer understands speech command and transforms the command into character strings. Semantic interpreter analyzes the character strings and creates the commands and questions to be transferred into the application program. We also consider the problems, related to the speech recognizer and the semantic interpreter, such as the ambiguity of natural language and the ambiguity and the errors from speech recognizer. This kind of natural language interface agent can be applied to the telephony environment involving all kind of communication media such as telephone, fax, e-mail, and so on.

  • PDF

Alzheimer's disease recognition from spontaneous speech using large language models

  • Jeong-Uk Bang;Seung-Hoon Han;Byung-Ok Kang
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.96-105
    • /
    • 2024
  • We propose a method to automatically predict Alzheimer's disease from speech data using the ChatGPT large language model. Alzheimer's disease patients often exhibit distinctive characteristics when describing images, such as difficulties in recalling words, grammar errors, repetitive language, and incoherent narratives. For prediction, we initially employ a speech recognition system to transcribe participants' speech into text. We then gather opinions by inputting the transcribed text into ChatGPT as well as a prompt designed to solicit fluency evaluations. Subsequently, we extract embeddings from the speech, text, and opinions by the pretrained models. Finally, we use a classifier consisting of transformer blocks and linear layers to identify participants with this type of dementia. Experiments are conducted using the extensively used ADReSSo dataset. The results yield a maximum accuracy of 87.3% when speech, text, and opinions are used in conjunction. This finding suggests the potential of leveraging evaluation feedback from language models to address challenges in Alzheimer's disease recognition.