• 제목/요약/키워드: Speech speed

검색결과 241건 처리시간 0.03초

AI 기반 교육 현황과 기술 동향 (Survey of Recent Research in Education based on Artificial Intelligence)

  • 전형배;정훈;강병옥;이윤경
    • 전자통신동향분석
    • /
    • 제36권1호
    • /
    • pp.71-80
    • /
    • 2021
  • Artificial intelligence (AI) will have a huge impact on future education. We look at the role of AI in education and changes in schools. Personalized education is being attempted in limited services, and an interactive tutor service with speech recognition/dialog technology is being developed. In the future, we look forward to fully personalized education for individual students through AI teachers. Teachers are expected to make more effort to teach creative thinking, critical thinking, communication, and collaboration. As the speed of development of AI technology accelerates, we expect that AI-based education will be deeply established around us in the near future. We first introduce the details of the personalization technology and then discuss the AI-based foreign language speaking education research conducted by ETRI.

NEC 7720 DSP를 이용한 SBC codec의 실시간 구현 (Real-Time Implementation of a SBC Codec Using a NEC 7720 DSP)

  • 오수환;이상욱
    • 대한전자공학회논문지
    • /
    • 제23권4호
    • /
    • pp.429-438
    • /
    • 1986
  • In this paper we have designed and implemented a real-time, full-duplex SBC (sub-band coding) codec at 16kbps using a high speed digital signal processor, NEC 7720. The SBC codec employs a QMF(quadrature mirror filter) filter bank based on the tree structures of two-band analysis-synthesis pairs to partition speech signal into 4 octabe bands. Computer simulation has been done to investigate the effect of fixed-point computation of the NEC 7720. Three different performance measures, the conventional signal-to-noise ratio, the informal listening test, and an LPC(linear predictive coding)distance measure, have been used in this simulation. The necessary parameters have been optimized through the simulation. The developed hardware and software have been tested in real-time operation using a hardware emulator.

  • PDF

Which Agent is More Captivating for Winning the Users' Hearts?: Focusing on Paralanguage Voice and Human-like Face Agent

  • SeoYoung Lee
    • Asia pacific journal of information systems
    • /
    • 제34권2호
    • /
    • pp.585-619
    • /
    • 2024
  • This paper delves into the comparative analysis of human interactions with AI agents based on the presence or absence of a facial representation, combined with the presence or absence of paralanguage voice elements. The "CASA (Computer-Are-Social-Actors)" paradigm posits that people perceive computers as social actors, not tools, unconsciously applying human norms and behaviors to computers. Paralanguages are speech voice elements such as pitch, tone, stress, pause, duration, speed that help to convey what a speaker is trying to communicate. The focus is on understanding how these elements collectively contribute to the generation of flow, intimacy, trust, and interactional enjoyment within the user experience. Subsequently, this study uses PLS analysis to explore the connections among all variables within the research framework. This paper has academic and practical implications.

Prosodic Boundary Effects on the V-to-V Lingual Movement in Korean

  • Cho, Tae-Hong;Yoon, Yeo-Min;Kim, Sa-Hyang
    • 말소리와 음성과학
    • /
    • 제2권3호
    • /
    • pp.101-113
    • /
    • 2010
  • The present study investigated how the kinematics of the /a/-to-/i/ tongue movement in Korean would be influenced by prosodic boundary. The /a/-to-/i/ sequence was used as 'transboundary' test materials which occurred across a prosodic boundary as in /ilnjəʃ$^h$a/ # / minsakwae/ ('일년차#민사과에' 'the first year worker' # 'dept. of civil affairs'). It also tested whether the V-to-V tongue movement would be further influenced by its syllable structure with /m/ which was placed either in the coda condition (/am#i/) or in the onset condition (/a#mi). Results of an EMA (Electromagnetic Articulagraphy) study showed that kinematical parameters such as the movement distance (displacement), the movement duration, and the movement velocity (speed) all varied as a function of the boundary strength, showing an articulatory strengthening pattern of a "larger, longer and faster" movement. Interestingly, however, the larger, longer and faster pattern associated with boundary marking in Korean has often been observed with stress (prominence) marking in English. It was proposed that language-specific prosodic systems induce different ways in which phonetics and prosody interact: Korean, as a language without lexical stress and pitch accent, has more degree of freedom to express prosodic strengthening, while languages such as English have constraints, so that some strengthening patterns are reserved for lexical stress. The V-to-V tongue movement was also found to be influenced by the intervening consonant /m/'s syllable affiliation, showing a more preboundary lengthening of the tongue movement when /m/ was part of the preboundary syllable (/am#i/). The results, together, show that the fine-grained phonetic details do not simply arise as low-level physical phenomena, but reflect higher-level linguistic structures, such as syllable and prosodic structures. It was also discussed how the boundary-induced kinematic patterns could be accounted for in terms of the task dynamic model and the theory of the prosodic gesture ($\pi$-gesture).

  • PDF

차량용 항법장치에서의 관심지 인식을 위한 다단계 음성 처리 시스템 (Multi-layer Speech Processing System for Point-Of-Interest Recognition in the Car Navigation System)

  • 방기덕;강철호
    • 한국멀티미디어학회논문지
    • /
    • 제12권1호
    • /
    • pp.16-25
    • /
    • 2009
  • 안전성을 최우선시 해야 하는 자동차 환경에서 관심지 (POI, Point-Of-Interest) 도메인을 대상으로 하는 대용량 고려 단어 인식 시스템은 최적의 인간-기계 상호접속(HMI, Human-Machine Interface) 기술을 요구하고 있다. 하지만, 매우 제한된 연산처리 능력과 메모리를 가지는 텔레매틱스 단말기에서 10만 단어 이상을 일반적인 음성인식 방식으로 처리하기는 불가능하다. 따라서 본 논문에서는 텔레매틱스 단말기의 관심지 인식을 위하여 다단계 구조의 대용량 고립단어 인식 시스템을 제안하였다. 이 관심지 인식 시스템의 성능향상을 위해 음소별 가우시안 혼합모델(GMM, Gaussian Mixture Model)을 사용한 음소 인식기와 음소별 거리 행렬(PDM, Phoneme-distance Matric) 레빈쉬타인(Levenshtein) 거리를 제안하였다. 제안한 방법은 낮은 처리속도와 적은 양의 메모리를 가지는 텔레매틱스 단말기에서도 대용량 고립단어에 대하여 우수한 인식 성능을 나타내었다. 본 논문에서 제안한 다단계 인식 시스템을 사용하였을 경우 실내에서 최대 94.8%, 자동차환경에서는 최대 92.4%의 인식 성능을 얻을 수 있었다.

  • PDF

Human-Computer Interaction Based Only on Auditory and Visual Information

  • Sha, Hui;Agah, Arvin
    • Transactions on Control, Automation and Systems Engineering
    • /
    • 제2권4호
    • /
    • pp.285-297
    • /
    • 2000
  • One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media, integrating speed input, natural language, graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech(Work uttered by the person) and processing of images(hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand, The systems utilizes another stationary camera to capture images of the users hand and a microphone to capture the users words. Upon processing of the images and sounds, the systems responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to and then the interface uses voice command provided by user to fine-the location, and change the zooming of the camera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be displayed. This type of system has applications in tele-conferencing and other rmote operations, where the system must respond to users command, in a manner similar to how the user would communicate with another person. The advantage of this approach is the elimination of the traditional input devices that the user must utilize in order to control a pan/tillt camera, replacing them with more "natural" means of interaction. A number of experiments were performed to evaluate the interface system with respect to its accuracy, efficiency, reliability, and limitation.

  • PDF

안드로이드 OS 기반 한국어 TTS 서비스의 설계 및 구현 (Implementation of Korean TTS Service on Android OS)

  • 김태권;김봉완;최대림;이용주
    • 한국콘텐츠학회논문지
    • /
    • 제12권1호
    • /
    • pp.9-16
    • /
    • 2012
  • 국내에서 출시된 안드로이드 기반의 스마트폰은 한국어 TTS 엔진이 내장되어 있지 않고, 구글에서도 공식적인 한국어 TTS 기술 개발을 발표하지 않고 있는 상황이다. 따라서 안드로이드 스마트폰을 사용하는 어플리케이션 개발자 및 사용자들의 불편이 갈수록 심해져 가고 있다. 본 논문은 안드로이드 기반의 스마트폰에서 서비스할 수 있는 TTS시스템의 설계 및 구현에 대해 기술하였다. 신속 명료한 TTS를 위해 안드로이드 NDK를 이용하여 텍스트 전처리와 합성음 생성 라이브러리를 구현하였다. 또한, 자바의 스레드 기법과 스트림을 적용한 AudioTrack 클래스 객체를 사용하여 TTS 응답시간을 최소화 하였다. 구현된 한국어 TTS 서비스를 테스트하기 위해 수신된 문자메시지를 읽어주는 어플리케이션을 설계 및 개발하였다. 평가 결과, 임의의 문장에 대해 자연스러운 합성음을 생성하였으며, 실시간 청취가 가능하였다. 또한, 어플리케이션 개발자들은 구현된 한국어 TTS 서비스를 이용하여 음성을 통한 정보 전달을 손쉽게 적용할 수 있다. 본 논문에서 구현한 한국어 TTS 서비스는 기존 제한적 음성합성 방식의 어플리케이션의 단점을 개선하였으며, 음성을 통한 정보전달 어플리케이션 개발자 및 사용자들에게 사용성과 편의성을 제공할 수 있다.

지속음 및 다층신경망을 이용한 화자증명 시스템 (Speaker Verification System Using Continuants and Multilayer Perceptrons)

  • Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2003년도 추계종합학술대회
    • /
    • pp.1015-1020
    • /
    • 2003
  • 생체정보를 활용하여 개인정보를 보호하는 기술 가운데 화자증명은 다양한 사용편의성과 구현비용 면에서 이점을 갖고 있어 폭넓은 활용이 기대된다. 화자증명은 증명성능의 신뢰성, 음성문장 사용의 유연성, 증명시스템 복잡도의 효율성 면에서 높은 수준을 달성해야 한다. 지속음은 화자 구별력이 뛰어나며 구별되는 종류가 한정적이고, MLP(multilayer perceptron)는 높은 패턴인식률과 신속한 동작성능을 갖고 있어 화자증명 시스템이 이와 같은 특성을 달성하기 위한 유력한 수단을 제공한다. 본 논문에서는 지속음과 MLP를 적용한 시스템을 구현하고 한국어 음성 데이터베이스를 이용하여 이 시스템의 성능을 측정하고 분석한다. 실험의 결과는 지속음이 세 가지 특성에 대해 우수한 효과를 가지며 MLP가 높은 신뢰성과 효율성을 달성하는 데 실질적인 도움이 됨을 확인한다.

  • PDF

감음신경성 난청의 모델링을 통한 라우드니스 누가현상의 시뮬레이션 (Simulation of the Loudness Recruitment using Sensorineural Hearing Impairment Modeling)

  • 김동욱;박영철;김원기;도원;박선준
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1997년도 추계학술대회
    • /
    • pp.63-66
    • /
    • 1997
  • With the advent of high speed digital signal processing chips, new digital techniques have been introduced to the hearing instrument. This advanced hearing instrument circuitry has led to the need or and the development of new fitting approach. A number of different fitting approaches have been developed over the past few years, yet there has been little agreement on which approach is the "best" or most appropriate to use. However, when we develop not only new hearing aid, but also its fitting method, the intensive subject-based clinical tests are necessarily accompanied. In this paper, we present an objective method to evaluate and predict the performance of hearing aids without the help of such subject-based tests. In the hearing impairment simulation (HIS) algorithm, a sensorineural hearing impairment model is established from auditory test data of the impaired subject being simulated. Also, in the hearing impairment simulation system the abnormal loudness relationships created by recruitment was transposed to the normal dynamic span of hearing. The nonlinear behavior of the loudness recruitment is defined using hearing loss unctions generated from the measurements. The recruitment simulation is validated by an experiment with two impaired listeners, who compared processed speech in the normal ear with unprocessed speech in the impaired ear. To assess the performance, the HIS algorithm was implemented in real-time using a floating-point DSP.

  • PDF

운동학습이론에 기초한 발성운동조절법이 근오용성 발성장애의 음성에 미치는 효과 (Effects of Motor Learning Guided Laryngeal Motor Control Therapy for Muscle Misuse Dysphonia)

  • 서인효;이옥분;이상준;정필상
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.133-140
    • /
    • 2011
  • Muscle misuse dysphonia (MMD) is defined as a behavioral voice disorder resulting from inappropriate contractions of intrinsic and/or extrinsic laryngeal muscles. The purpose of this study was to investigate the effect of motor learning guided laryngeal motor control therapy (MLG-LMCT) which is designed to improve an existing LMT and further the effective voice treatment on people with muscle misuse dysphonia. Forty-six people with MMD (M:F=16:30) participated in this study. The voice samples of the participants were recorded to investigate the effect of MLG-LMCT before and after the voice therapy. Voice samples were analyzed via electro-glotto-graph (EGG). Contact quotient (CQ), speed quotient (SQ), and waveform were reported. In addition, perceptual and acoustical evaluation were conducted to determine the change of voice improvement after treatment. The experimenter massaged the tensioned muscles around the neck. In order to find more proper phonation the experimenter showed the subjects their EGG wave forms as to whether or not they are moving the vocal folds to the appropriate position. Therefore, the EGG wave forms were used as a type of visual feedback. With the wave form, the experimenter helped subjects move the vocal folds and laryngeal muscles to find more proper voice production. The sensory stimuli from the experimenter gradually faded out. A paired dependent t- test revealed that there was significant differences in CQ between pre- and post-therapy. Perceptually, overall, rough, breathy, strain, and transition were significantly reduced. Acoustically, there were significant differences in Fo, jitter, shimmer, and NHR. After using MLG-LMCT, most of the subjects showed improvements in voice quality. The results from this study led us to the following conclusions: Motor learning guided laryngeal motor control therapy (MLG-LMCT) has reduces muscle misuse dysphonia. These results may occur because a visual feedback from EGG wave form can maintain the effect of the muscle tension reduction from laryngeal manual therapy. In case of people with MMD who reduced muscle tension from the therapy (LMT) but, not appropriately manipulating the location of larynx or adducting the vocal folds, MLG-LMCT might be an alternative therapy approach.

  • PDF