• Title/Summary/Keyword: Voice-Based Interface

Search Result 130, Processing Time 0.024 seconds

Design and Implementation of Web based Voice Traffic Management System using CDR (CDR을 이용한 웹 기반 음성 트래픽 관리시스템의 설계 및 구현)

  • Kim, Eun-Seong;An, Seong-Jin;Jeong, Jin-Uk
    • The KIPS Transactions:PartC
    • /
    • v.8C no.5
    • /
    • pp.657-666
    • /
    • 2001
  • In this paper, it is proposed the management items for voice traffic using CDRs so that global carriers can treat and manage the voice traffic for a customer, and defined computational expressions to produce the management items. From them, we have designed the management system, which is composed of web interface module, analysis module, data collection module and database management module, and have improved the availability and convenience of the system using web technologies. In addition, we have tested these items using CDRs in real environments that are collected by the global carrier in order to verify their validity. It is expected that the proposed web based voice traffic management system provide a global carrier with network information collection, fault detection/trouble-shooting and high quality of service through analyzing the characteristics of subscribers.

  • PDF

A policy study for the voice recognition technology based on elderly health care (음성인식기술의 노인간병 적용을 위한 정책연구)

  • Cho, Byung-Chul;Cheon, Sooyoung;Kim, Kab-Nyun;Yuk, Hyun-Seung
    • Journal of Digital Convergence
    • /
    • v.16 no.2
    • /
    • pp.9-17
    • /
    • 2018
  • The purpose of this study is to find out how voice recognition technology can be utilized to solve the elderly problem rapidly aging in Korea. Public support services and civilian nursing services for the elderly are expected to expand in Korea. In this case, voice recognition technology can be used variously for the elderly who are not familiar with the media interface. To this end, our researchers visited Japan and examined the achievements obtained by voice recognition technology in the elderly care. Especially, when caregivers write reports, they have greatly reduced their working hours by replacing the handwritten reports with ones using voice recognition technology. This method can be easily implemented in Korea. In addition, the social cost of the elderly support can be gradually reduced through the development of a robot equipped with voice recognition technology. Consequently, we realize that when voice recognition technology is combined with artificial intelligence programs of various emotion recognition functions and various policy possibilities as well.

A Fuzzy-Neural Network Based Human-Machine Interface for Voice Controlled Robots Trained by a Particle Swarm Optimization

  • Watanabe, Keigo;Chatterjee, Amitava;Pulasinghe, Koliya;Izumi, Kiyotaka;Kiguchi, Kazuo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.411-414
    • /
    • 2003
  • Particle swarm optimization (PSO) is employed to train fuzzy-neural networks (FNN), which can be employed as an important building block in real life robot systems, controlled by voice-based commands. The FNN is also trained to capture the user spoken directive in the context of the present performance of the robot system. The system has been successfully employed in a real life situation for navigation of a mobile robot.

  • PDF

Usability Analysis and Improvement Plan for Intelligent Speakers in the 4th Industrial Revolution Environment

  • Seong-Hoon Lee;Dong-Woo Lee
    • International journal of advanced smart convergence
    • /
    • v.12 no.4
    • /
    • pp.119-125
    • /
    • 2023
  • Smart home in the 4th industrial revolution environment is where all devices in the home are connected to each other to provide the optimal living environment desired by the user. Artificial intelligence speakers are being used as a way to manage and control all devices used in this environment. The function of an artificial intelligence speaker ranges from simple music playback to serving as an interface that controls and manages all devices in a smart home space. In this study, we investigated and analyzed the usability of artificial intelligence speakers based on the current status of domestic and overseas markets and the survey contents of two organizations (Korea Consumer Agency and Korea Information and Communication Policy Institute (KISDI)). In addition, we investigated and analyzed the usability of artificial intelligence speakers. Based on the results of responses from users from two related organizations, major problems were derived, and major improvement measures, such as discovering new functions and improving voice recognition performance, were also described.

Design of Specialized User Interface for Mobile Ubiquitous Devices Based on Using Patterns (사용자의 사용 방식에 근거한 이동형 유비쿼터스 단말기의 사용자 인터페이스 환경 설계)

  • Na, SangYeob;Yoo, HeeYong
    • The Journal of Korean Association of Computer Education
    • /
    • v.9 no.6
    • /
    • pp.79-87
    • /
    • 2006
  • An ubiquitous environment has been developed in order to allow users to use information more easily. These environments are based on advanced development of mobile ubiquitous hardwares. Currently, a various user interfaces are developed for mobile ubiquitous devices using the graphic or voice. In this paper, propose a specialized graphical user interface which is based on analysis of a user profile. This user interface can provides suitable interface for individual users using XML information on the small screen of mobile ubiquitous devices.

  • PDF

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

Singing Voice Synthesis Using HMM Based TTS and MusicXML (HMM 기반 TTS와 MusicXML을 이용한 노래음 합성)

  • Khan, Najeeb Ullah;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.53-63
    • /
    • 2015
  • Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

An Android Application for Speech Communication of People with Speech Disorders (언어장애인을 위한 안드로이드 기반 의사소통보조 어플리케이션)

  • Choi, Yoonjung;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.141-148
    • /
    • 2014
  • Voice is the most common means for communication, but some people have difficulties in generating voice due to their congenital or acquired disorders. Individuals with speech disorders might lose their speaking ability due to hearing impairment, encephalopathy or cerebral palsy accompanied by motor skill impairments, or autism caused by mental problems. However, they have needs for communication, so some of them use various types of AAC (Augmentative & Alternative Communication) devices in order to meet their communication needs. In this paper, a mobile application for literate people having speech disorder was designed and implemented by developing accurate and fast sentence-completion functions for efficient user interaction. From a user study and the previous study on Korean text-based communication for adults having difficulty in speech communication, we identified functionality and usability requirements. Specifically, the user interface with scanning features was designed by considering the users' motor skills in using the touch-screen of a mobile device. Finally, we conducted the usability test for the application. The results of the usability test show that the application is easy to learn and efficient to use in communication with people with speech disorders.

Design of 3-Dimensional Remote Monitoring System Using Telephone Line and Internet (전화선자 인터텟을 이용한 3차원 원격 모니터링 시스템의 설계)

  • 양필수;김주환;김성호
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.47-47
    • /
    • 2000
  • Most measuring devices are equipped with RS-232 or GPIB interface for communicating data with computers. If the measuring devices can be accessed by a server computer, the valuable information from the devices can be effectively shared with other computers via internet. But, if the measuring devices and the server computer are too far away, it is difficulty to directly connect them by RS232 interface. PSTN(Public Switched Telephone Network) refers to the world's collection of interconnected voice-oriented public telephone networks. Measuring computer system which is equipped with RS232 interface and modem for PSTN can be introduced to overcome the aforementioned distance problem, In this work, an internet based remote monitoring system which utilizes PSTN and VRML for 3-dimensional GUI is proposed.

  • PDF

Design and Implementation of Multimodal Middleware for Mobile Environments (모바일 환경을 위한 멀티모달 미들웨어의 설계 및 구현)

  • Park, Seong-Soo;Ahn, Se-Yeol;Kim, Won-Woo;Koo, Myoung-Wan;Park, Sung-Chan
    • MALSORI
    • /
    • no.60
    • /
    • pp.125-144
    • /
    • 2006
  • W3C announced a standard software architecture for multimodal context-aware middleware that emphasizes modularity and separates structure, contents, and presentation. We implemented a distributed multimodal interface system followed the W3C architecture, based on SCXML. SCXML uses parallel states to invoke both XHTML and VoiceXML contents as well as to gather composite or sequential multimodal inputs through man-machine interactions. We also hire Delivery Context Interface(DCI) module and an external service bundle enabling middleware to support context-awareness services for real world environments. The provision of personalized user interfaces for mobile devices is expected to be used for different devices with a wide variety of capabilities and interaction modalities. We demonstrated the implemented middleware could maintain multimodal scenarios in a clear, concise and consistent manner by some experiments.

  • PDF