Search | Korea Science

Voice Driven Sound Sketch for Animation Authoring Tools (애니메이션 저작도구를 위한 음성 기반 음향 스케치)

Kwon, Soon-Il
- The Journal of the Korea Contents Association
- /
- v.10 no.4
- /
- pp.1-9
- /
- 2010
Authoring tools for sketching the motion of characters to be animated have been studied. However the natural interface for sound editing has not been sufficiently studied. In this paper, I present a novel method that sound sample is selected by speaking sound-imitation words(onomatopoeia). Experiment with the method based on statistical models, which is generally used for pattern recognition, showed up to 97% in the accuracy of recognition. In addition, to address the difficulty of data collection for newly enrolled sound samples, the GLR Test based on only one sample of each sound-imitation word showed almost the same accuracy as the previous method.
https://doi.org/10.5392/JKCA.2010.10.4.001 인용 PDF KSCI

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

변경진;최민석;한민수;김경수
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.1
- /
- pp.56-61
- /
- 2001
This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.
PDF

Implementation of Real-time Vowel Recognition Mouse based on Smartphone (스마트폰 기반의 실시간 모음 인식 마우스 구현)

Jang, Taeung;Kim, Hyeonyong;Kim, Byeongman;Chung, Hae
- KIISE Transactions on Computing Practices
- /
- v.21 no.8
- /
- pp.531-536
- /
- 2015
The speech recognition is an active research area in the human computer interface (HCI). The objective of this study is to control digital devices with voices. In addition, the mouse is used as a computer peripheral tool which is widely used and provided in graphical user interface (GUI) computing environments. In this paper, we propose a method of controlling the mouse with the real-time speech recognition function of a smartphone. The processing steps include extracting the core voice signal after receiving a proper length voice input with real time, to perform the quantization by using the learned code book after feature extracting with mel frequency cepstral coefficient (MFCC), and to finally recognize the corresponding vowel using hidden markov model (HMM). In addition a virtual mouse is operated by mapping each vowel to the mouse command. Finally, we show the various mouse operations on the desktop PC display with the implemented smartphone application.
https://doi.org/10.5626/KTCP.2015.21.8.531 인용 KSCI

Design and Implementation of Server-Based Web Reader kWebAnywhere (서버 기반 웹 리더 kWebAnywhere의 설계 및 구현)

Yun, Young-Sun
- Phonetics and Speech Sciences
- /
- v.5 no.4
- /
- pp.217-225
- /
- 2013
This paper describes the design and implementation of the kWebAnywhere system based on WebAnywhere, which assists people with severely diminished eye sight and the blind people to access Internet information through Web interfaces. The WebAnywhere is a server-based web reader which reads aloud the web contents using TTS(text-to-speech) technology on the Internet without installing any software on the client's system. The system can be used in general web browsers using a built-in audio function, for blind users who are unable to afford to use a screen reader and for web developers to design web accessibility. However, the WebAnywhere is limited to supporting only a single language and cannot be applied to Korean web contents directly. Thus, in this paper, we modified the WebAnywhere to serve multiple language contents written in both English and Korean texts. The modified WebAnywhere system is called kWebAnywhere to differentiate it with the original system. The kWebAnywhere system is modified to support the Korean TTS system, VoiceText$^{TM}$, and to include user interface to control the parameters of the TTS system. Because the VoiceText$^{TM}$ system does not support the Festival API used in the WebAnywhere, we developed the Festival Wrapper to transform the VoiceText$^{TM}$'s private APIs to the Festival APIs in order to communicate with the WebAnywhere engine. We expect that the developed system can help people with severely diminished eye sight and the blind people to access the internet contents easily.
https://doi.org/10.13064/KSSS.2013.5.4.217 인용 PDF

Design and Implementation of CNN-based HMI System using Doppler Radar and Voice Sensor (도플러 레이다 및 음성 센서를 활용한 CNN 기반 HMI 시스템 설계 및 구현)

Oh, Seunghyun;Bae, Chanhee;Kim, Seryeong;Cho, Jaechan;Jung, Yunho
- Journal of IKEEE
- /
- v.24 no.3
- /
- pp.777-782
- /
- 2020
In this paper, we propose CNN-based HMI system using Doppler radar and voice sensor, and present hardware design and implementation results. To overcome the limitation of single sensor monitoring, the proposed HMI system combines data from two sensors to improve performance. The proposed system exhibits improved performance by 3.5% and 12% compared to a single radar and voice sensor-based classifier in noisy environment. In addition, hardware to accelerate the complex computational unit of CNN is implemented and verified on the FPGA test system. As a result of performance evaluation, the proposed HMI acceleration platform can be processed with 95% reduction in computation time compared to a single software-based design.
https://doi.org/10.7471/ikeee.2020.24.3.777 인용 PDF KSCI

An Arrangement Method of Voice and Sound Feedback According to the Operation : For Interaction of Domestic Appliance (조작 방식에 따른 음성과 소리 피드백의 할당 방법 가전제품과의 상호작용을 중심으로)

Hong, Eun-ji;Hwang, Hae-jeong;Kang, Youn-ah
- Journal of the HCI Society of Korea
- /
- v.11 no.2
- /
- pp.15-22
- /
- 2016
The ways to interact with digital appliances are becoming more diverse. Users can control appliances using a remote control and a touch-screen, and appliances can send users feedback through various ways such as sound, voice, and visual signals. However, there is little research on how to define which output method to use for providing feedback according to the user' input method. In this study, we designed an experimental study that seeks to identify how to appropriately match the output method - voice and sound - based on the user input - voice and button. We made four types of interaction with two kinds input methods and two kinds of output methods. For the four interaction types, we compared the usability, perceived satisfaction, preference and suitability. Results reveals that the output method affects the ease of use and perceived satisfaction of the input method. The voice input method with sound feedback was evaluated more satisfying than with the voice feedback. However, the keying input method with voice feedback was evaluated more satisfying than with sound feedback. The keying input method was more dependent on the output method than the voice input method. We also found that the feedback method of appliances determines the perceived appropriateness of the interaction.
PDF KSCI

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

Roh, Hee-Kyung;Lee, Kang-Hee
- Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
- /
- v.7 no.12
- /
- pp.819-829
- /
- 2017
In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.
https://doi.org/10.14257/ajmahs.2017.12.22 인용

Design of Gesture based Interfaces for Controlling GUI Applications (GUI 어플리케이션 제어를 위한 제스처 인터페이스 모델 설계)

Park, Ki-Chang;Seo, Seong-Chae;Jeong, Seung-Moon;Kang, Im-Cheol;Kim, Byung-Gi
- The Journal of the Korea Contents Association
- /
- v.13 no.1
- /
- pp.55-63
- /
- 2013
NUI(Natural User Interfaces) has been developed through CLI(Command Line Interfaces) and GUI(Graphical User Interfaces). NUI uses many different input modalities, including multi-touch, motion tracking, voice and stylus. In order to adopt NUI to legacy GUI applications, he/she must add device libraries, modify relevant source code and debug it. In this paper, we propose a gesture-based interface model that can be applied without modification of the existing event-based GUI applications and also present the XML schema for the specification of the model proposed. This paper shows a method of using the proposed model through a prototype.
https://doi.org/10.5392/JKCA.2013.13.01.055 인용 PDF KSCI

A Multimodal Interface for Telematics based on Multimodal middleware (미들웨어 기반의 텔레매틱스용 멀티모달 인터페이스)

Park, Sung-Chan;Ahn, Se-Yeol;Park, Seong-Soo;Koo, Myoung-Wan
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.41-44
- /
- 2007
In this paper, we introduce a system in which car navigation scenario is plugged multimodal interface based on multimodal middleware. In map-based system, the combination of speech and pen input/output modalities can offer users better expressive power. To be able to achieve multimodal task in car environments, we have chosen SCXML(State Chart XML), a multimodal authoring language of W3C standard, to control modality components as XHTML, VoiceXML and GPS. In Network Manager, GPS signals from navigation software are converted to EMMA meta language, sent to MultiModal Interaction Runtime Framework(MMI). Not only does MMI handles GPS signals and a user's multimodal I/Os but also it combines them with information of device, user preference and reasoned RDF to give the user intelligent or personalized services. The self-simulation test has shown that middleware accomplish a navigational multimodal task over multiple users in car environments.
PDF

An Advanced User-friendly Wireless Smart System for Vehicle Safety Monitoring and Accident Prevention (차량 안전 모니터링 및 사고 예방을 위한 친사용자 환경의 첨단 무선 스마트 시스템)

Oh, Se-Bin;Chung, Yeon-Ho;Kim, Jong-Jin
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.9
- /
- pp.1898-1905
- /
- 2012
This paper presents an On-board Smart Device (OSD) for moving vehicle, based on a smooth integration of Android-based devices and a Micro-control Unit (MCU). The MCU is used for the acquisition and transmission of various vehicle-borne data. The OSD has threefold functions: Record, Report and Alarm. Based on these RRA functions, the OSD is basically a safety and convenience oriented smart device, where it facilitates alert services such as accident report and rescue as well as alarm for the status of vehicle. In addition, voice activated interface is developed for the convenience of users. Vehicle data can also be uploaded to a remote server for further access and data manipulation. Therefore, unlike conventional blackboxes, the developed OSD lends itself to a user-friendly smart device for vehicle safety: It basically stores monitoring images in driving plus vehicle data collection. Also, it reports on accident and enables subsequent rescue operation. The developed OSD can thus be considered an essential safety smart device equipped with comprehensive wireless data service, image transfer and voice activated interface.
https://doi.org/10.6109/jkiice.2012.16.9.1898 인용 PDF KSCI

Search Result 130, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)