• Title/Summary/Keyword: Voice command

Search Result 96, Processing Time 0.024 seconds

Interface Modeling for Digital Device Control According to Disability Type in Web

  • Park, Joo Hyun;Lee, Jongwoo;Lim, Soon-Bum
    • Journal of Multimedia Information System
    • /
    • v.7 no.4
    • /
    • pp.249-256
    • /
    • 2020
  • Learning methods using various assistive and smart devices have been developed to enable independent learning of the disabled. Pointer control is the most important consideration for the disabled when controlling a device and the contents of an existing graphical user interface (GUI) environment; however, difficulties can be encountered when using a pointer, depending on the disability type; Although there are individual differences depending on the blind, low vision, and upper limb disability, problems arise in the accuracy of object selection and execution in common. A multimodal interface pilot solution is presented that enables people with various disability types to control web interactions more easily. First, we classify web interaction types using digital devices and derive essential web interactions among them. Second, to solve problems that occur when performing web interactions considering the disability type, the necessary technology according to the characteristics of each disability type is presented. Finally, a pilot solution for the multimodal interface for each disability type is proposed. We identified three disability types and developed solutions for each type. We developed a remote-control operation voice interface for blind people and a voice output interface applying the selective focusing technique for low-vision people. Finally, we developed a gaze-tracking and voice-command interface for GUI operations for people with upper-limb disability.

The Performance Experiments on the Tactical Data Communication over the Legacy Radio Systems (기존 전술 무전기를 이용한 전술 데이터 통신 성능 실험)

  • Sim, Dong-Sub;Kang, Kyeong-Sung;Kim, Ki-Hyung
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.13 no.2
    • /
    • pp.243-251
    • /
    • 2010
  • The military has been putting great efforts into applying data communication on existing voice communication systems being used in NCW(Network Centric Warfare). Data communication will be an effective choice in one of many effort to yield a minimum kill chain, comparing to legacy voice communications, when tactical units conduct their missions. However, the required budget will be enormous, in case of the replacement of a lot of legacy communication systems with new one. As a cost-effective alternative, the tactical data communication systems using the conventional radio systems instead of the development of new radio systems has been proposed. It is mandatory, though, to ensure QoS while maintaining data communication by making use of legacy radio systems already in use. This paper focuses on the performance issues experimented and analyzed for tactical data communication through the legacy radio systems as the first step towards guaranteed QoS. We have conducted various experiments such as the transmission error rate on certain tactical messages, performance evaluation of redundant transfers, the relationship between the transmission frame size and rate of error, the identification of error points in the transmission frame, and techniques to reduce the errors in both hopping and non-hopping modes. As a result of the performance experiments, The adaptive communication module which decides the redundant transmission or the Forward Error Correction(FEC) technique by analyzing channel status and current transmission status(hopping/non-hopping) of the legacy radio should be designed. the FEC technique in non-hopping, and the redundant transmission technique in hopping mode was recommended from the result of experiment with the frame size is 20bytes in non-hopping and 10Bytes frame size in hopping mode.

Human-Computer Interaction Based Only on Auditory and Visual Information

  • Sha, Hui;Agah, Arvin
    • Transactions on Control, Automation and Systems Engineering
    • /
    • v.2 no.4
    • /
    • pp.285-297
    • /
    • 2000
  • One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media, integrating speed input, natural language, graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech(Work uttered by the person) and processing of images(hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand, The systems utilizes another stationary camera to capture images of the users hand and a microphone to capture the users words. Upon processing of the images and sounds, the systems responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to and then the interface uses voice command provided by user to fine-the location, and change the zooming of the camera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be displayed. This type of system has applications in tele-conferencing and other rmote operations, where the system must respond to users command, in a manner similar to how the user would communicate with another person. The advantage of this approach is the elimination of the traditional input devices that the user must utilize in order to control a pan/tillt camera, replacing them with more "natural" means of interaction. A number of experiments were performed to evaluate the interface system with respect to its accuracy, efficiency, reliability, and limitation.

  • PDF

Speech Recognition Model Based on CNN using Spectrogram (스펙트로그램을 이용한 CNN 음성인식 모델)

  • Won-Seog Jeong;Haeng-Woo Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.685-692
    • /
    • 2024
  • In this paper, we propose a new CNN model to improve the recognition performance of command voice signals. This method obtains a spectrogram image after performing a short-time Fourier transform (STFT) of the input signal and improves command recognition performance through supervised learning using a CNN model. After Fourier transforming the input signal for each short-time section, a spectrogram image is obtained and multi-classification learning is performed using a CNN deep learning model. This effectively classifies commands by converting the time domain voice signal to the frequency domain to express the characteristics well and performing deep learning training using the spectrogram image for the conversion parameters. To verify the performance of the speech recognition system proposed in this study, a simulation program using Tensorflow and Keras libraries was created and a simulation experiment was performed. As a result of the experiment, it was confirmed that an accuracy of 92.5% could be obtained using the proposed deep learning algorithm.

Performance Of Adaptive and Fixed Step Size Power Control Schemes Accommodating Integrated Voice/Video/Data in Wireless Cellular Systems (무선 셀룰라 시스템의 통합된 서비스를 수용하기 위한 적응 및 고정 스텝 크기 전력제어 방법의 성능분석)

  • Kim Jeong-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1A
    • /
    • pp.9-17
    • /
    • 2004
  • Adapt ive and fixed step size PC (power control) schemes for accommodating voice, video, and data are evaluated according to the different PC command rates and their effects on integrated Voice/Video/Data are investigated. The required minimum power levels are derived as PC thresholds and the effects of PC errors on channel quality and radio 1 ink capacity are investigated. The services with high bit rates and low bit error rates can cause a significant effect on the radio link qualifies of the other types of traffic. The results show that the adapt ive step size PC scheme for voice/video/data services can achieve more capacity and cause less interference to the radio channels because less minimum PIL(Power Increment Level) is required for the specified radio link outage probability.

A Study on the Practical Methodology of Engineering Education through the Making of Smart Mirror (스마트 거울의 제작을 통해 이루어진 공학 교육 실천 방법론에 관한 연구)

  • Seo, Myeong-Deok;Kwon, Ji-Young;Chang, Eun-Young
    • Journal of Practical Engineering Education
    • /
    • v.10 no.1
    • /
    • pp.9-15
    • /
    • 2018
  • A digital signage is constructed using a speech recognition based API, and VRSM (Voice Recognition Smart Mirror) that obtains information such as weather, map, exercise information, schedule, and image by user's voice command so as to be different from other commercialized products is proposed. This course provides an effective method of engineering education through the process of being evaluated as the result of independent graduation certification system, and also it had been the opportunity to design and produce works for 3 semesters by 2 students one group in the majors. Through the comprehensive capstone design, it has experienced engineering approach and creative thinking opportunity. We have won the best academic prize by participating in the academic conferences of the institute about the interim result, and obtained the results of the prize contest in other academic conferences. The improvement in practical skills obtained through this process proved to be beneficial for self-confidence and job-seeking opportunities through actual employment.

Proposal of Hostile Command Attack Method Using Audible Frequency Band for Smart Speaker (스마트 스피커 대상 가청 주파수 대역을 활용한 적대적 명령어 공격 방법 제안)

  • Park, Tae-jun;Moon, Jongsub
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.1-9
    • /
    • 2022
  • Recently, the functions of smart speakers have diversified, and the penetration rate of smart speakers is increasing. As it becomes more widespread, various techniques have been proposed to cause anomalous behavior against smart speakers. Dolphin Attack, which causes anomalous behavior against the Voice Controllable System (VCS) during various attacks, is a representative method. With this method, a third party controls VCS using ultrasonic band (f>20kHz) without the user's recognition. However, since the method uses the ultrasonic band, it is necessary to install an ultrasonic speaker or an ultrasonic dedicated device which is capable of outputting an ultrasonic signal. In this paper, a smart speaker is controlled by generating an audio signal modulated at a frequency (18 to 20) which is difficult for a person to hear although it is in the human audible frequency band without installing an additional device, that is, an ultrasonic device. As a result with the method proposed in this paper, while humans could not recognize voice commands even in the audible band, it was possible to control the smart speaker with a probability of 82 to 96%.

Design and Implementation of Order Settlement System Using Artificial Intelligence Speaker (인공지능 스피커를 활용한 주문결제 시스템의 설계 및 구현)

  • Kim, Dong-Hyun;Choi, Byung-Hyun;Ban, Chae-Hoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1181-1186
    • /
    • 2019
  • Recently, we have been able to quickly order and pay with kiosks even at fast food restaurants, small private restaurants and cafes. However, people with disabilities who are uncomfortable with their arms and who are sitting in wheelchairs are difficult to use by pressing graphical buttons to use kiosks. Older people also feel uncomfortable to use kiosks because of their cognitive abilities to accept new information as they get older. In this paper, to solve this problem, we design and implement a order-payment system to add the voice command element of the AI speaker to the visual command element when the user interacts with the kiosk.

Design of Gesture based Interfaces for Controlling GUI Applications (GUI 어플리케이션 제어를 위한 제스처 인터페이스 모델 설계)

  • Park, Ki-Chang;Seo, Seong-Chae;Jeong, Seung-Moon;Kang, Im-Cheol;Kim, Byung-Gi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.55-63
    • /
    • 2013
  • NUI(Natural User Interfaces) has been developed through CLI(Command Line Interfaces) and GUI(Graphical User Interfaces). NUI uses many different input modalities, including multi-touch, motion tracking, voice and stylus. In order to adopt NUI to legacy GUI applications, he/she must add device libraries, modify relevant source code and debug it. In this paper, we propose a gesture-based interface model that can be applied without modification of the existing event-based GUI applications and also present the XML schema for the specification of the model proposed. This paper shows a method of using the proposed model through a prototype.

Implementation of a Refusable Human-Robot Interaction Task with Humanoid Robot by Connecting Soar and ROS (Soar (State Operator and Result)와 ROS 연계를 통해 거절가능 HRI 태스크의 휴머노이드로봇 구현)

  • Dang, Chien Van;Tran, Tin Trung;Pham, Trung Xuan;Gil, Ki-Jong;Shin, Yong-Bin;Kim, Jong-Wook
    • The Journal of Korea Robotics Society
    • /
    • v.12 no.1
    • /
    • pp.55-64
    • /
    • 2017
  • This paper proposes combination of a cognitive agent architecture named Soar (State, operator, and result) and ROS (Robot Operating System), which can be a basic framework for a robot agent to interact and cope with its environment more intelligently and appropriately. The proposed Soar-ROS human-robot interaction (HRI) agent understands a set of human's commands by voice recognition and chooses to properly react to the command according to the symbol detected by image recognition, implemented on a humanoid robot. The robotic agent is allowed to refuse to follow an inappropriate command like "go" after it has seen the symbol 'X' which represents that an abnormal or immoral situation has occurred. This simple but meaningful HRI task is successfully experimented on the proposed Soar-ROS platform with a small humanoid robot, which implies that extending the present hybrid platform to artificial moral agent is possible.