• Title/Summary/Keyword: ASR system

Search Result 108, Processing Time 0.021 seconds

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

  • Changhan Oh;Cheongbin Kim;Kiyoung Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.75-82
    • /
    • 2023
  • Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.

Cathode Properties of Sm-Sr-(Co,Fe,Ni)-O System with Perovskite and Spinel Structures for Solid Oxide Fuel Cell (고체산화물 연료전지의 페로브스카이트와 스피넬 구조를 갖는 Sm-Sr-(Co,Fe,Ni)-O 시스템의 공기극 특성)

  • Baek, Seung-Wook;Kim, Jung-Hyun;Baek, Seung-Whan;Bae, Joong-Myeon
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.133-136
    • /
    • 2007
  • Perovskite-structured samarium strontium cobaltite (SSC), which is mixed ionic electronic conductor (MIEC), is considered as a promising cathode material for intermediate temperature-operating solid oxide fuel cell (SOFC) due to its high electrocatalytic property. Cathode material containing cobalt (Co) is unstable at high temperature and has a relatively high thermal expansion property. In this paper, Sm-Sr-(Co,Fe,Ni)-O system with perovskite and spinel structures was investigated in terms of electrochemical property and thermal expansion property, respectively. Area specific resistance (ASR) was measured by ac impedance spectroscopy to investigate the electrochemical property of cathode, and thermal expansion coefficient (TEC) was measured by using dilatometer. Micro structure of cathode was observed by scanning electron microscopy. Perovskite-structured $Sm_{0.5}Sr_{0.5}CoO_{3-\delta}$ showed the ASR of $0.87{\Omega}/cm^{2}$, and $Sm_{0.5}Sr_{0.5}NiO_{3-\delta}$, which actually has a spinel structure, showed the lowest TEC value of $13.3{\times}10^{-6}/K$.

  • PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.6
    • /
    • pp.462-472
    • /
    • 2004
  • In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.

Bone Marrow Cell Proliferation Activity through Intestinal Immune System by the Components of Atractylodes lancea DC. (창출 성분의 장관면역 자극을 통한 골수세포 증식활성)

  • Yu, Kwang-Won;Shin, Kwang-Soon
    • Korean Journal of Food Science and Technology
    • /
    • v.33 no.1
    • /
    • pp.135-141
    • /
    • 2001
  • Of hot-water extracts prepared from 10 herbal components of Sip-Jeon-Dae-Bo-Tang, Atractylodes lancea DC. (ALR) and Panax ginseng C.A. Meyer (PG) showed the most potent bone marrow cell proliferation activity through intestinal immune system whereas other extracts did not have the activity except for Astragalus membranacues Bunge (ASR) and Angelica acutiloba Kitagawa (AR) having low activity. Especially, ALR had the potent activity irrespective of classes of ALR, a place of production and the condition of breeding. In addition, we found that hot-water extract from Atractylodes lancea DC rhizomes (ALR-0) contributed mainly to Peyer's patch cells mediated-hematopoietic response of Sip-Jeon-Dae-Bo-Tang. ALR-0 was further fractionated into MeOH-soluble fraction (ALR-1), MeOH-insoluble and EtOH-soluble fraction (ALR-2), and the crude polysaccharide fraction (ALR-3). Among these fractions, only ALR-3 showed potent stimulating activity for proliferation of bone marrow cells mediated by Peyer's patch cells, dose-dependently. In treatments of ALR-3 with $NaIO_4,\;NaClO_2$ and pronase, all significantly reduced the intestinal immune system modulating activity of ALR-3, and the activity of ALR-3 was much affected by $NaIO_4$ oxidation particularly. These results reveal that macromolecules, such as polysaccharide, rather than low-molecular-weight substances, are the potent intestinal immune system modulating compound of ALR.

  • PDF

Childhood Cancer Incidence and Survival 1985-2009, Khon Kaen, Thailand

  • Wiangnon, Surapon;Jetsrisuparb, Arunee;Komvilaisak, Patcharee;Suwanrungruang, Krittika
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.18
    • /
    • pp.7989-7993
    • /
    • 2014
  • Background: The Khon Kaen Cancer Registry (KKCR) was established in 1984. Previous population-based incidences and survivals of childhood cancer in Thailand were determined using a short cancer registration period. Materials and Methods: Data were retrieved of all children residing in Khon Kaen, between 0-15 years, diagnosed as having cancer and registered in the KKCR (1985-2009). The follow-up censored date was December 31, 2012. The childhood cancers were classified into 12 diagnostic groups, according to the International Classification of Childhood Cancer. The incidence was calculated by the standard method. Survival of childhood cancer was investigated using the KKCR population-based registration data and overall survival calculated using the Kaplan Meier method. Results: In the study period, 912 newly diagnosed cases of childhood cancer were registered. The respective mean and median age was 6.4 (SD=4.6) and 6 (0-14) years. The age-peak for incidence was 0-4 years. The age-standardized rate (ASR) was 83 per million. Leukemia was the most common cancer (N=360, ASR 33.8) followed by neoplasms of the central nervous system (CNS, N=150, ASR 12.8) and lymphoma (N=79, ASR 7.0). The follow-up duration totaled 101,250 months. The death rate was 1.11 per 100 person-months (95%CI: 1.02 -1.20). The 5-year overall survival was 52% (95%CI: 53-56.9) for all cancers. The respective 5-year overall survival for (1) acute lymphoblastic leukemia (ALL), (2) acute non-lymphoblastic leukemia (ANLL), (3) lymphoma, (4) germ cell tumors, (5) renal tumors, (6) retinoblastoma, (7) soft tissue tumors, (8) CNS tumors, (9) bone tumors, (10) liver tumors, and (11) neuroblastoma was (1) 51%, (2) 37%, (3) 63%, (4) 74%, (5) 67%, (6) 55%, (7) 46%, (8) 44%, (9) 36%, (10) 34%, and (11) 25%. Conclusions: The incidence of childhood cancer is lower than those of western countries. Respective overall survival for ALL, lymphoma, renal tumors, liver tumors, retinoblastoma, soft tissue tumors is lower than that reported in developed countries while survival for CNS tumors, neuroblastoma and germ cell tumors is comparable.

Irregular Pronunciation Detection for Korean Point-of-Interest Data Using Prosodic Word

  • Kim Sun-Hee;Jeon Je-Hun;Na Min-Soo;Chung Min-Hwa
    • MALSORI
    • /
    • no.57
    • /
    • pp.123-137
    • /
    • 2006
  • This paper aims to propose a method of detecting irregular pronunciations for Korean POI data adopting the notion of the Prosodic Word based on the Prosodic Phonology (Selkirk 1984, Nespor and Vogel 1986) and Intonational Phonology (Jun 1996). In order to show the performance of the proposed method, the detection experiment was conducted on the 250,000 POI data. When all the data were trained, 99.99% of the exceptional prosodic words were detected, which shows the stability of the system. The results show that similar ratio of exceptional prosodic words (22.4% on average) were detected on each stage where a certain amount of the training data were added. Being intended to be an example of an interdisciplinary study of linguistics and computer science, this study will, on the one hand, provide an understanding of Korean language from the phonological point of view, and, on the other hand, enable a systematic development of a multiple pronunciation lexicon for Korean TTS or ASR systems of high performance.

  • PDF

Development of HPCI Prediction Model for Concrete Pavement Using Expressway PMS Database (고속도로 PMS D/B를 활용한 콘크리트 포장 상태지수(HPCI) 예측모델 개발 연구)

  • Suh, Young-Chan;Kwon, Sang-Hyun;Jung, Dong-Hyuk;Jeong, Jin-Hoon;Kang, Min-Soo
    • International Journal of Highway Engineering
    • /
    • v.19 no.6
    • /
    • pp.83-95
    • /
    • 2017
  • PURPOSES : The purpose of this study is to develop a regression model to predict the International Roughness Index(IRI) and Surface Distress(SD) for the estimation of HPCI using Expressway Pavement Management System(PMS). METHODS : To develop an HPCI prediction model, prediction models of IRI and SD were developed in advance. The independent variables considered in the models were pavement age, Annual Average Daily Traffic Volume(AADT), the amount of deicing salt used, the severity of Alkali Silica Reaction(ASR), average temperature, annual temperature difference, number of days of precipitation, number of days of snowfall, number of days below zero temperature, and so on. RESULTS : The present IRI, age, AADT, annual temperature differential, number of days of precipitation and ASR severity were chosen as independent variables for the IRI prediction model. In addition, the present IRI, present SD, amount of deicing chemical used, and annual temperature differential were chosen as independent variables for the SD prediction model. CONCLUSIONS : The models for predicting IRI and SD were developed. The predicted HPCI can be calculated from the HPCI equation using the predicted IRI and SD.

Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

  • Oh, Yoo Rhee;Park, Kiyoung;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.476-490
    • /
    • 2022
  • With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.

Knowledge Transfer Using User-Generated Data within Real-Time Cloud Services

  • Zhang, Jing;Pan, Jianhan;Cai, Zhicheng;Li, Min;Cui, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.77-92
    • /
    • 2020
  • When automatic speech recognition (ASR) is provided as a cloud service, it is easy to collect voice and application domain data from users. Harnessing these data will facilitate the provision of more personalized services. In this paper, we demonstrate our transfer learning-based knowledge service that built with the user-generated data collected through our novel system that deliveries personalized ASR service. First, we discuss the motivation, challenges, and prospects of building up such a knowledge-based service-oriented system. Second, we present a Quadruple Transfer Learning (QTL) method that can learn a classification model from a source domain and transfer it to a target domain. Third, we provide an overview architecture of our novel system that collects voice data from mobile users, labels the data via crowdsourcing, utilises these collected user-generated data to train different machine learning models, and delivers the personalised real-time cloud services. Finally, we use the E-Book data collected from our system to train classification models and apply them in the smart TV domain, and the experimental results show that our QTL method is effective in two classification tasks, which confirms that the knowledge transfer provides a value-added service for the upper-layer mobile applications in different domains.

Selection of Customized ELV (End-of-Life Vehicle) Dismantling System for Different Countries by Utilizing Fuzzy Theory and Modified QFD (국가 맞춤형 폐자동차 해체시스템 선정 방법에 대한 연구)

  • Yi, Hwa-Cho;Park, Jung Whan;Hwang, Seon;Park, Sung-Su
    • Clean Technology
    • /
    • v.23 no.1
    • /
    • pp.15-26
    • /
    • 2017
  • The recycling process of ELV consists of three phases: dismantling, shredding and ASR treatment. Dismantling is the collection of reusable parts and the most important phase. The types of dismantling system is diverse and each country has different characteristics. Therefore, the selection of a suitable ELV dismantling system for a target country is dependent on the characteristics of each country. But the characteristics of country data changes every year and is insufficient and ambiguous. In this study, fuzzy inference and modified QFD (Quality function deployment) methods are utilized to solve the problems. The fuzzification of characteristics data for each country, customized rules and decision of modified QFD matrix are developed, which is applied to sample countries.