• Title/Summary/Keyword: 음성분석 및 변환

Search Result 65, Processing Time 0.025 seconds

A Wavelet based Adaptive Algorithm using New Fast Running FIR Filter Structure (새로운 Fast running FIR filter구조를 이용한 웨이블렛 기반 적응 알고리즘에 관한 연구)

  • Lee, Jae-Kyun;Park, Jae-Hoon;Lee, Chae-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.1C
    • /
    • pp.1-8
    • /
    • 2007
  • LMS(Least Mean Square) algorithm using steepest descent way in adaptive signal processing requires simple equation and is used widely because of the less complexity. But eigenvalues change by width of input signals in time domain, so the rate of convergence becomes low. In this paper, we propose a new fast running FIR filter structure that improves the convergence speed of adaptive signal processing and the same performance as the existing fast wavelet transform algorithm with less computational complexity. The proposed filter structure is applied to wavelet based adaptive algorithm. Simulation results show a better performance than the existing one.

The Implementation Directions and an Analysis of Assistive Devices and Alternative Formats to Improve Accessibility for Disabled People (장애인 접근성 향상을 위한 보조기기 및 대체자료 분석과 구현 방향)

  • Rim, Myunghwan;Gil, Younhee;Jeon, Gwangil
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.664-673
    • /
    • 2015
  • The assistive devices for disabled people are being highlighted even in industrial aspects through the policy and support for disabled people, enactment of regulation for the improvement of accessibility of disabled, technological innovation and product development. Recently, internet access with the sense of touch and hearing and utilizing electronic publishing contents and e-mailing are being convenient through the product of ICT development such as screen reader for visually impaired people, braille display, screen enlarger, text converter and others. Even so, in rapidly changing digital media smart era, the accessibility of visually impaired people is still poor and assistive devices and alternative formats are in need of improvement. Therefore, in aspect of the research and development innovation, this study proposes the implementation directions for improvement of accessibility by analyzing the current situation and structure of alternative formats and assistive devices for visually impaired people. As a result, in the future, various types of digital information are expected to be converted into a customized and realistic forms and distributed through a dedicated disability products or smart devices.

AI Advisor for Response of Disaster Safety in Risk Society (위험사회 재난 안전 분야 대응을 위한 AI 조력자)

  • Lee, Yong-Hak;Kang, Yunhee;Lee, Min-Ho;Park, Seong-Ho;Kang, Myung-Ju
    • Journal of Platform Technology
    • /
    • v.8 no.3
    • /
    • pp.22-29
    • /
    • 2020
  • The 4th industrial revolution is progressing by country as a mega trend that leads various technological convergence directions in the social and economic fields from the initial simple manufacturing innovation. The epidemic of infectious diseases such as COVID-19 is shifting digital-centered non-face-to-face business from economic operation, and the use of AI and big data technology for personalized services is essential to spread online. In this paper, we analyze cases focusing on the application of artificial intelligence technology, which is a key technology for the effective implementation of the digital new deal promoted by the government, as well as the major technological characteristics of the 4th industrial revolution and describe the use cases in the field of disaster response. As a disaster response use case, AI assistants suggest appropriate countermeasures according to the status of the reporter in an emergency call. To this end, AI assistants provide speech recognition data-based analysis and disaster classification of converted text for adaptive response.

  • PDF

Deep neural networks for speaker verification with short speech utterances (짧은 음성을 대상으로 하는 화자 확인을 위한 심층 신경망)

  • Yang, IL-Ho;Heo, Hee-Soo;Yoon, Sung-Hyun;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.6
    • /
    • pp.501-509
    • /
    • 2016
  • We propose a method to improve the robustness of speaker verification on short test utterances. The accuracy of the state-of-the-art i-vector/probabilistic linear discriminant analysis systems can be degraded when testing utterance durations are short. The proposed method compensates for utterance variations of short test feature vectors using deep neural networks. We design three different types of DNN (Deep Neural Network) structures which are trained with different target output vectors. Each DNN is trained to minimize the discrepancy between the feed-forwarded output of a given short utterance feature and its original long utterance feature. We use short 2-10 s condition of the NIST (National Institute of Standards Technology, U.S.) 2008 SRE (Speaker Recognition Evaluation) corpus to evaluate the method. The experimental results show that the proposed method reduces the minimum detection cost relative to the baseline system.

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

Amplification and Howling Suppression of Telephonic Speech for the Hearing-Impaired Person (난청인을 위한 전화기 음성증폭 및 하울링 억제)

  • Lee, Sang-Min
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.6
    • /
    • pp.623-629
    • /
    • 1998
  • To provide sufficient sound to the hearing-impaired person(HIP) who have many difficulty in communication with others using general telephone, big amplification is needed. But big amplification can occur howling as a side effect. In this study we developed the new technique of big amplification without howling, manufactured and estimated the new hearing aid telephone. Telephone speech is divided to three frequency band, amplified respectively and fitted to HIP's hearing ability. The telephone speech frequency is monitored by counter in time domain. The counter transfers the sinusoidal sound to rectangular wave using comparator and counts the number of rectangular wave in a certain time period, that is frequency, to monitor the howling. Telephone have microphone and speaker, which are fitted in a rigid structure and frequency band of telephone sound is limited, so howling occurs in the limited frequency band. If the counter notices that howling conditions happen, microprocessor decreases quickly the gain of the related frequency band. The result of test of our new hearing aid telephone showed that we can amplifiy the sound as much as 40dB, which is meaningful level to many HIP, and make HIP increase their perception ability from 20% to 60.8% in 1 syllable test and from 28.9% to 78% in 2 syllable test.

  • PDF

Automatic Electronic Medical Record Generation System using Speech Recognition and Natural Language Processing Deep Learning (음성인식과 자연어 처리 딥러닝을 통한 전자의무기록자동 생성 시스템)

  • Hyeon-kon Son;Gi-hwan Ryu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.731-736
    • /
    • 2023
  • Recently, the medical field has been applying mandatory Electronic Medical Records (EMRs) and Electronic Health Records (EHRs) systems that computerize and manage medical records, and distributing them throughout the entire medical industry to utilize patients' past medical records for additional medical procedures. However, the conversations between medical professionals and patients that occur during general medical consultations and counseling sessions are not separately recorded or stored, so additional important patient information cannot be efficiently utilized. Therefore, we propose an electronic medical record system that uses speech recognition and natural language processing deep learning to store conversations between medical professionals and patients in text form, automatically extracts and summarizes important medical consultation information, and generates electronic medical records. The system acquires text information through the recognition process of medical professionals and patients' medical consultation content. The acquired text is then divided into multiple sentences, and the importance of multiple keywords included in the generated sentences is calculated. Based on the calculated importance, the system ranks multiple sentences and summarizes them to create the final electronic medical record data. The proposed system's performance is verified to be excellent through quantitative analysis.

Design of QPSK Ultrasonic Transceiver For Underwater Communication (수중 통신을 위한 QPSK 초음파 송수신기의 설계)

  • Cho Nai-Hyun;Kim Duk-Yung;Kim Yong-Deuk;Chung Yun-Mo
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.43 no.3 s.309
    • /
    • pp.51-59
    • /
    • 2006
  • In this paper, we propose an excellent ultrasonic transceiver system based on a QPSK modulation technique for underwater communication. The transmitter sends a still image at the level of 187dB re $1{\mu}Pa/V@1m$ through a power amplifier by driving an ultrasonic sensor. The receiver performs digital conversion at the 100kHz sampling frequency, demodulation and decoding process for the image sent from the transmitter through the underwater communication. We have shown that the processed image at the receiver is almost the same as the orignal one. The maximum detection distance of the system proposed in this paper is approximately 1.17km. To cope with the difficulties of transmission loss, this paper proposes, implements and analyzes important parameters of sensors and circuits used in the system. Most of the underwater communication has focused on the transmission of audio signal, but this paper suggests an efficient underwater communication system for still image transmission.

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

  • Hyun Kwon;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.4
    • /
    • pp.61-70
    • /
    • 2023
  • Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.

Design requirements of mediating device for total physical response - A protocol analysis of preschool children's behavioral patterns (체감형 학습을 위한 매개 디바이스의 디자인 요구사항 - 프로토콜 분석법을 통한 미취학 아동의 행동 패턴 분석)

  • Kim, Yun-Kyung;Kim, Hyun-Jeong;Kim, Myung-Suk
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.103-110
    • /
    • 2010
  • TPR(Total Physical Response) is a new representative learning method for children's education. Today's approach to TPR has focused on signals from a user which becomes input data in a human-computer interaction, but the accuracy of sensing from body signals(e. g. motion and voice) isn't so perfect that it seems difficult to apply on an education system. To overcome these limits, we suggest a mediating interface device which can detect the user's motion using correct numerical values such as acceleration and angular speed. In addition, we suggest new design requirements for the mediating device through analyzing children's behavior as human factors by ethnography research and protocol analysis. As a result, we found that; children are unskilled in physical control when they use objects; tend to lean on an object unconsciously with touch. Also their behaviors are restricted, when they use objects. Therefore a mediating device should satisfy new design requirements which are make up for unskilled handling, support familiar and natural physical activity.

  • PDF