• Title/Summary/Keyword: Speech signals

Search Result 498, Processing Time 0.023 seconds

Research on Developing a Conversational AI Callbot Solution for Medical Counselling

  • Won Ro LEE;Jeong Hyon CHOI;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.4
    • /
    • pp.9-13
    • /
    • 2023
  • In this study, we explored the potential of integrating interactive AI callbot technology into the medical consultation domain as part of a broader service development initiative. Aimed at enhancing patient satisfaction, the AI callbot was designed to efficiently address queries from hospitals' primary users, especially the elderly and those using phone services. By incorporating an AI-driven callbot into the hospital's customer service center, routine tasks such as appointment modifications and cancellations were efficiently managed by the AI Callbot Agent. On the other hand, tasks requiring more detailed attention or specialization were addressed by Human Agents, ensuring a balanced and collaborative approach. The deep learning model for voice recognition for this study was based on the Transformer model and fine-tuned to fit the medical field using a pre-trained model. Existing recording files were converted into learning data to perform SSL(self-supervised learning) Model was implemented. The ANN (Artificial neural network) neural network model was used to analyze voice signals and interpret them as text, and after actual application, the intent was enriched through reinforcement learning to continuously improve accuracy. In the case of TTS(Text To Speech), the Transformer model was applied to Text Analysis, Acoustic model, and Vocoder, and Google's Natural Language API was applied to recognize intent. As the research progresses, there are challenges to solve, such as interconnection issues between various EMR providers, problems with doctor's time slots, problems with two or more hospital appointments, and problems with patient use. However, there are specialized problems that are easy to make reservations. Implementation of the callbot service in hospitals appears to be applicable immediately.

Optimizing Wavelet in Noise Canceler by Deep Learning Based on DWT (DWT 기반 딥러닝 잡음소거기에서 웨이블릿 최적화)

  • Won-Seog Jeong;Haeng-Woo Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.113-118
    • /
    • 2024
  • In this paper, we propose an optimal wavelet in a system for canceling background noise of acoustic signals. This system performed Discrete Wavelet Transform(DWT) instead of the existing Short Time Fourier Transform(STFT) and then improved noise cancellation performance through a deep learning process. DWT functions as a multi-resolution band-pass filter and obtains transformation parameters by time-shifting the parent wavelet at each level and using several wavelets whose sizes are scaled. Here, the noise cancellation performance of several wavelets was tested to select the most suitable mother wavelet for analyzing the speech. In this study, to verify the performance of the noise cancellation system for various wavelets, a simulation program using Tensorflow and Keras libraries was created and simulation experiments were performed for the four most commonly used wavelets. As a result of the experiment, the case of using Haar or Daubechies wavelets showed the best noise cancellation performance, and the mean square error(MSE) was significantly improved compared to the case of using other wavelets.

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

Physiologic Phonetics for Korean Stop Production (한국어 자음생성의 생리음성학적 특성)

  • Hong, Ki-Hwan;Yang, Yoon-Soo
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.17 no.2
    • /
    • pp.89-97
    • /
    • 2006
  • The stop consonants in Korean are classified into three types according to the manner of articulation as unaspirated (UA), slightly aspirated (SA) and heavily aspirated (HA) stops. Both the UA and the HA types are always voiceless in any environment. Generally, the voice onset time (VOT) could be measured spectrographically from release of consonant burst to onset of following vowel. The VOT of the UA type is within 20 msec of the burst, and about 40-50 msec in the SA and 50-70 msec in the HA. There have been many efforts to clarify properties that differentiate these manner categories. Umeda, et $al^{1)}$ studied that the fundamental frequency at voice onset after both the UA and HA consonants was higher than that for the SA consonants, and the voice onset times were longest in the HA followed by the SA and UA. Han, et $al^{2)}$ reported in their speech synthesis and perception studies that the SA and UA stops differed primarily in terms of a gradual versus a relatively rapid intensity build-up of the following vowel after the stop release. Lee, et $al^{3)}$ measured both the intraoral and subglottal air pressure that the subglottal pressure was higher for the HA stop than for the other two stops. They also compared the dynamic pattern of the subglottal pressure slope for the three categories and found that the HA stop showed the most rapid increase in subglottal pressure in the time period immediately before the stop release. $Kagaya^{4)}$ reported fiberscopic and acoustic studies of the Korean stops. He mentioned that the UA type may be characterized by a completely adducted state of the vocal folds, stiffened vocal folds and the abrupt decreasing of the stiffness near the voice onset, while the HA type may be characterized by an extensively abducted state of the vocal folds and a heightened subglottal pressure. On the other hand, none of these positive gestures are observed for the SA type. Hong, et $al^{5)}$ studied electromyographic activity of the thyroarytenoid and posterior cricoarytenoid (PCA) muscles during stop production. He reported a marked and early activation of the PCA muscle associated with a steep reactivation of the thyroarytenoid muscle before voice onset in the production of the HA consonants. For the production of the UA consonants, little or no activation of the PCA muscle and earliest and most marked reactivation of the thyroarytenoid muscle were characteristic. For the SA consonants, he reported a more moderate activation of the PCA muscle than for the UA consonant, and the least and the latest reactivation of the thyroarytenoid muscle. Hong, et $al^{6)}$ studied the observation of the vibratory movements of vocal fold edges in terms of laryngeal gestures according to the different types of stop consonants. The movements of vocal fold edges were evaluated using high speed digital images. EGG signals and acoustic waveforms were also evaluated and related to the vibratory movements of vocal fold edges during stop production.

  • PDF

An Arrangement Method of Voice and Sound Feedback According to the Operation : For Interaction of Domestic Appliance (조작 방식에 따른 음성과 소리 피드백의 할당 방법 가전제품과의 상호작용을 중심으로)

  • Hong, Eun-ji;Hwang, Hae-jeong;Kang, Youn-ah
    • Journal of the HCI Society of Korea
    • /
    • v.11 no.2
    • /
    • pp.15-22
    • /
    • 2016
  • The ways to interact with digital appliances are becoming more diverse. Users can control appliances using a remote control and a touch-screen, and appliances can send users feedback through various ways such as sound, voice, and visual signals. However, there is little research on how to define which output method to use for providing feedback according to the user' input method. In this study, we designed an experimental study that seeks to identify how to appropriately match the output method - voice and sound - based on the user input - voice and button. We made four types of interaction with two kinds input methods and two kinds of output methods. For the four interaction types, we compared the usability, perceived satisfaction, preference and suitability. Results reveals that the output method affects the ease of use and perceived satisfaction of the input method. The voice input method with sound feedback was evaluated more satisfying than with the voice feedback. However, the keying input method with voice feedback was evaluated more satisfying than with sound feedback. The keying input method was more dependent on the output method than the voice input method. We also found that the feedback method of appliances determines the perceived appropriateness of the interaction.

Design and Implementation of a Real-time Bio-signal Obtaining, Transmitting, Compressing and Storing System for Telemedicine (원격 진료를 위한 실시간 생체 신호 취득, 전송 및 압축, 저장 시스템의 설계 및 구현)

  • Jung, In-Kyo;Kim, Young-Joon;Park, In-Su;Lee, In-Sung
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.45 no.4
    • /
    • pp.42-50
    • /
    • 2008
  • The real-time bio-signal monitoring system based on the ZigBee and SIP/RTP has proposed and implemented for telemedicine but that has some problems at the stabilities to transmit bio-signal from the sensors to the other sides. In this paper, we designed and implemented a real-time bio-signal monitoring system that is focused on the reliability and efficiency for transmitting bio-signal at real-time. We designed the system to have enhanced architecture and performance in the ubiquitous sensor network, SIP/RTP real-time transmission and management of the database. The Bluetooth network is combined with ZigBee network to distribute traffic of the ECG and the other bio-signal. The modified and multiplied RTP session is used to ensure real-time transmission of ECG, other bio-signals and speech information on the internet. The modified ECG compression method based on DWLT and MSVQ is used to reduce data rate for storing ECG to the database. Finally we implemented a system that has improved performance for transmitting bio-signal from the sensors to the monitoring console and database. This implemented system makes possible to make various applications to serve U-health care services.

Derivation of Asymptotic Formulas for the Signal-to-Noise Ratio of Mismatched Optimal Laplacian Quantizers (불일치된 최적 라플라스 양자기의 신호대잡음비 점근식의 유도)

  • Na, Sang-Sin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.5C
    • /
    • pp.413-421
    • /
    • 2008
  • The paper derives asymptotic formulas for the MSE distortion and the signal-to-noise ratio of a mismatched fixed-rate minimum MSE Laplacian quantizer. These closed-form formulas are expressed in terms of the number N of quantization points, the mean displacement $\mu$, and the ratio $\rho$ of the standard deviation of the source to that for which the quantizer is optimally designed. Numerical results show that the principal formula is accurate in that, for rate R=$log_2N{\geq}6$, it predicts signal-to-noise ratios within 1% of the true values for a wide range of $\mu$, and $\rho$. The new findings herein include the fact that, for heavy variance mismatch of ${\rho}>3/2$, the signal-to-noise ratio increases at the rate of $9/\rho$ dB/bit, which is slower than the usual 6 dB/bit, and the fact that an optimal uniform quantizer, though optimally designed, is slightly more than critically mismatched to the source. It is also found that signal-to-noise ratio loss due to $\mu$ is moderate. The derived formulas can be useful in quantization of speech or music signals, which are modeled well as Laplacian sources and have changing short-term variances.

Modeling of Sensorineural Hearing Loss for the Evaluation of Digital Hearing Aid Algorithms (디지털 보청기 알고리즘 평가를 위한 감음신경성 난청의 모델링)

  • 김동욱;박영철
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.1
    • /
    • pp.59-68
    • /
    • 1998
  • Digital hearing aids offer many advantages over conventional analog hearing aids. With the advent of high speed digital signal processing chips, new digital techniques have been introduced to digital hearing aids. In addition, the evaluation of new ideas in hearing aids is necessarily accompanied by intensive subject-based clinical tests which requires much time and cost. In this paper, we present an objective method to evaluate and predict the performance of hearing aid systems without the help of such subject-based tests. In the hearing impairment simulation(HIS) algorithm, a sensorineural hearing impairment medel is established from auditory test data of the impaired subject being simulated. Also, the nonlinear behavior of the loudness recruitment is defined using hearing loss functions generated from the measurements. To transform the natural input sound into the impaired one, a frequency sampling filter is designed. The filter is continuously refreshed with the level-dependent frequency response function provided by the impairment model. To assess the performance, the HIS algorithm was implemented in real-time using a floating-point DSP. Signals processed with the real-time system were presented to normal subjects and their auditory data modified by the system was measured. The sensorineural hearing impairment was simulated and tested. The threshold of hearing and the speech discrimination tests exhibited the efficiency of the system in its use for the hearing impairment simulation. Using the HIS system we evaluated three typical hearing aid algorithms.

  • PDF