• Title/Summary/Keyword: Voice Interaction

Search Result 159, Processing Time 0.034 seconds

A Novel Computer Human Interface to Remotely Pick up Moving Human's Voice Clearly by Integrating ]Real-time Face Tracking and Microphones Array

  • Hiroshi Mizoguchi;Takaomi Shigehara;Yoshiyasu Goto;Hidai, Ken-ichi;Taketoshi Mishima
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1998.10a
    • /
    • pp.75-80
    • /
    • 1998
  • This paper proposes a novel computer human interface, named Virtual Wireless Microphone (VWM), which utilizes computer vision and signal processing. It integrates real-time face tracking and sound signal processing. VWM is intended to be used as a speech signal input method for human computer interaction, especially for autonomous intelligent agent that interacts with humans like as digital secretary. Utilizing VWM, the agent can clearly listen human master's voice remotely as if a wireless microphone was put just in front of the master.

  • PDF

Design of Metaverse for Two-Way Video Conferencing Platform Based on Virtual Reality

  • Yoon, Dongeon;Oh, Amsuk
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.189-194
    • /
    • 2022
  • As non-face-to-face activities have become commonplace, online video conferencing platforms have become popular collaboration tools. However, existing video conferencing platforms have a structure in which one side unilaterally exchanges information, potentially increase the fatigue of meeting participants. In this study, we designed a video conferencing platform utilizing virtual reality (VR), a metaverse technology, to enable various interactions. A virtual conferencing space and realistic VR video conferencing content authoring tool support system were designed using Meta's Oculus Quest 2 hardware, the Unity engine, and 3D Max software. With the Photon software development kit, voice recognition was designed to perform automatic text translation with the Watson application programming interface, allowing the online video conferencing participants to communicate smoothly even if using different languages. It is expected that the proposed video conferencing platform will enable conference participants to interact and improve their work efficiency.

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

The effect of the human voice that is consistent with context and the mechanical melody on user's subjective experience in mobile phones (휴대전화 상황에서 맥락과 일치하는 사람음과 단순 기계음이 사용자의 주관적 경험에 미치는 영향)

  • Cho, Yu-Suk;Eom, Ki-Min;Joo, Hyo-Min;Suk, Ji-He;Han, Kwang-Hee
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.531-544
    • /
    • 2009
  • In the past, objective usability was one of the most important aspects when user used system. But nowadays user's subjective experiences are getting more critical element than objective usability in HCI(human-computer interaction). Most people own their mobile phone and use it frequently these days. It is especially important to make user's subjective experiences more positive when using devices like mobile phones people frequently carry and interact with. This study investigates whether the interfaces which express the emotion give more positive experiences to users. Researchers created mobile phone prototypes to compare the effect of mechanical melody feedback(the major auditory feedbacks on mobile phones) and emotional voice feedback(recorded human voice). Participants experienced four kinds of mobile phone prototypes(no feedback, mechanical melody feedback, emotional voice feedback and dual feedback) and evaluated their experienced usability, hedonic quality and preference. The result suggests that person's perceptional fun and hedonic quality were getting increased in the phone which gave the emotional voice feedback than the mechanical melody feedback. Nevertheless, the preference was evaluated lower in the emotional voice feedback condition than the others.

  • PDF

Design and Implementation of Visual/Control Communication Protocol for Home Automated Robot Interaction and Control (홈오토메이션을 위한 영상/로봇제어 시스템의 설계와 구현)

  • Cho, Myung-Ji;Kim, Seong-Whan
    • Journal of Internet Computing and Services
    • /
    • v.10 no.6
    • /
    • pp.27-36
    • /
    • 2009
  • PSTN (public switched telephone network) provides voice communication service, whereas IP network provides data oriented service, and we can use IP network for multimedia transport service (e.g. voice over IP service) with economic price. In this paper, we propose RoIP (robot on IP) service scenario, signaling call flow, and implementation to provide home automation and monitoring service for remote site users. In our scheme, we used a extended SIP (session initiation protocol) for signaling protocol between remote site users and home robots. For our bearer transport control, we implemented H.263 video codec over RTP (real-time transport protocol) and additionally DTMF (dual tone multi-frequency) transport for robot actuator control. We implemented our scheme on home robots and experimented with KTF operator network, and it shows good communication quality (average MOS = 9.15) and flexible robot controls.

  • PDF

The Role of the Electroglottography on the Laryngeal Articulation of Speech (전기 Glottography(EGG)를 이용한 후두구음역학적 특성)

  • 홍기환;박병암;양윤수;서수영;김현기
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.8 no.1
    • /
    • pp.18-26
    • /
    • 1997
  • There are two types of phonetic study, acoustic and physiologic, for differentiating the three manner categories of Korean stop consonants. On the physiologic studies, there are endoscopic, electromyographic(EMG), electroglottographic(EGG) and aerodynamic studies. In this study, I tried to investigate general features of Korean stops using EGG study for the open quotient of vocal fold and baseline shift during speech, and aerodynamic characteristics for e subglottal air pressure, air flow and glottal resistance at consonants. On the aerodynamic study, the glottalized and aspirated stops may be characterized by e increasing subglottal pressure comparing with lenis stop at consonants. The airflow is largest in the aspirated stops followed by lenis stops and glottalized. The glottal airway resistance (GAR) showed highest in the glottalized followed by the lenis, but lowest in e aspirated during e production of consonants, and showed highest in e aspirated, but low in the glottalized and lenis during the production of vowel. The glottal resistance at consonant showed significant difference among consonants and significant interaction between subject and types of consonant. The glottal resistance at vowel showed significant difference among consonants, and e interaction occured between subject and types of consonant. The electroglottography(EGG) has been used for investigating e functioning of e vocal folds during its vibration. The EGG should be related to the patterns of the vocal fold vibration during phonation in characterizing the temporal patterns of each vibratory cycle. The purpose of this study is to investigate the dynamic change of EGG waveforms during continuous speech. The dynamic changes of EGG waveforms fir the three-way distinction of Korean stops were characterized that the aspirated stop appears to be characterized by largest open quotient and smallest glottal contact area of the vocal folds in e initial portion of vocal fold vibration ; the lenis stop by moderate open quotient and glottal contact area ; but the glottalized stop by smallest open quotient and largest glottal contact area. There may be close relationship between the OQ(open quotient) in the initial voice onset and the glottal width at the time of consonant production, the larger glottal width just before vocal fold vibration results in the smaller OQ of the vocal fold vibration in the initial voice onset. The EGG changes of baseline shift during continuous speech production were characterized by the different patterns for the three types of Korean consonants. The small and less stiffness change of baseline shift was found for the lenis and the glottalized, and the largest and stiffest change was found for the aspirated. On the baseline shift for the initial voice onset, they showed so similar patterns with for the consonant production, larger changed in the aspirated. for the lenis and the glottalized during the initial voice onset, three subjects showed individual difference each other. I suggest at s characteristics were strongly related with articulatory activity of vocal tract for the production of consonant, especially for the aspirated stop. The suspecting factors to affect EGG waveforms are glottal width, vertical laryngeal movement and the intrapharyngeal pressure to neighboring tissue during connected spech. So the EGG may be an useful method to describe laryngeal activity to classify pulsing conditions of the larynx during speech production, and EGG research can be controls for monitoring the vocal tract articulation, although above factors to affect EGG would have played such a potentially role on vocal fold vibratory behavior obtained using consonant production.

  • PDF

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

  • Lee, Hyub-Woo;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

The Effect of AI Agent's Multi Modal Interaction on the Driver Experience in the Semi-autonomous Driving Context : With a Focus on the Existence of Visual Character (반자율주행 맥락에서 AI 에이전트의 멀티모달 인터랙션이 운전자 경험에 미치는 효과 : 시각적 캐릭터 유무를 중심으로)

  • Suh, Min-soo;Hong, Seung-Hye;Lee, Jeong-Myeong
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.92-101
    • /
    • 2018
  • As the interactive AI speaker becomes popular, voice recognition is regarded as an important vehicle-driver interaction method in case of autonomous driving situation. The purpose of this study is to confirm whether multimodal interaction in which feedback is transmitted by auditory and visual mode of AI characters on screen is more effective in user experience optimization than auditory mode only. We performed the interaction tasks for the music selection and adjustment through the AI speaker while driving to the experiment participant and measured the information and system quality, presence, the perceived usefulness and ease of use, and the continuance intention. As a result of analysis, the multimodal effect of visual characters was not shown in most user experience factors, and the effect was not shown in the intention of continuous use. Rather, it was found that auditory single mode was more effective than multimodal in information quality factor. In the semi-autonomous driving stage, which requires driver 's cognitive effort, multimodal interaction is not effective in optimizing user experience as compared to single mode interaction.

Design and Implementation of the Image Creation System based on User-Media Interaction (사용자와 미디어 사이의 상호작용 기능 제공 기반 영상 창작 시스템 설계 및 구현)

  • Song, Bok Deuk;Kim, Sang Yun;Kim, Chae Kyu
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.5
    • /
    • pp.932-938
    • /
    • 2016
  • Recently, interactive media which maximizes audience engagement by making the audience appeal on a stage in digital media environment has been distributed more widely. In fact, there has been active movement to develop and promote a new participatory media genre with higher immersion by applying this kind of interactive media concept to advertisement, film, game and e-learning. In the conventional interactive media, digital media had to be enjoyed in particular environment where diverse sensors were installed or through a certain device to recognize a user's motion and voice. This study attempted to design and implement an image creation system which ensures interactions between a user and media in popular distribution-enabled web environment and through PC and smart devices to minimize the image producer-user constraints.

Liquid chromatography-tandem mass spectrometric analysis of oleracone D and its application to pharmacokinetic study in mice

  • Lim, Dong Yu;Lee, Tae Yeon;Lee, Jaehyeok;Song, Im-Sook;Han, Young Taek;Choi, Min-Koo
    • Analytical Science and Technology
    • /
    • v.34 no.5
    • /
    • pp.193-201
    • /
    • 2021
  • We have demonstrated a sensitive analytical method of measuring oleracone D in mouse plasma using a liquid chromatography-tandem mass spectrometry (LC-MS/MS). Oleracone D and oleracone F (internal standard) in mouse plasma samples were processed using a liquid-liquid extraction method with methyl tertbutyl ether, resulting in high and reproducible extraction recovery (80.19-82.49 %). No interfering peaks around the peak elution time of oleracone D and oleracone F were observed. The standard calibration curves for oleracone D ranged from 0.5 to 100 ng/mL and were linear with r2 of 0.992. The inter- and intra-day accuracy and precision and the stability fell within the acceptance criteria. The pharmacokinetics of oleracone D following intravenous and oral administration of oleracone D at doses of 5 mg/kg and 30 mg/kg, respectively, were investigated. When oleracone D was intravenously injected, it had first-order elimination kinetics with high clearance and volume of distribution values. The absolute oral bioavailability of this compound was calculated as 0.95 %, with multi-exponential kinetics. The low aqueous solubility and a high oral dose of oleracone D may explain the different elimination kinetics of oleracone D between intravenous and oral administration. Collectively, this newly developed sensitive LC-MS/MS method of oleracone D could be successfully utilized for investigating the pharmacokinetic properties of this compound and could be used in future studies for the lead optimization and biopharmaceutic investigation of oleracone D.