• Title/Summary/Keyword: speech analysis

Search Result 1,587, Processing Time 0.026 seconds

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Performance Analysis of a Statistical Packet Voice/Data Multiplexer (통계적 패킷 음성 / 데이터 다중화기의 성능 해석)

  • 신병철;은종관
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.3
    • /
    • pp.179-196
    • /
    • 1986
  • In this paper, the peformance of a statistical packet voice/data multiplexer is studied. In ths study we assume that in the packet voice/data multiplexer two separate finite queues are used for voice and data traffics, and that voice traffic gets priority over data. For the performance analysis we divide the output link of the multiplexer into a sequence of time slots. The voice signal is modeled as an (M+1) - state Markov process, M being the packet generation period in slots. As for the data traffic, it is modeled by a simple Poisson process. In our discrete time domain analysis, the queueing behavior of voice traffic is little affected by the data traffic since voice signal has priority over data. Therefore, we first analyze the queueing behavior of voice traffic, and then using the result, we study the queueing behavior of data traffic. For the packet voice multiplexer, both inpur state and voice buffer occupancy are formulated by a two-dimensional Markov chain. For the integrated voice/data multiplexer we use a three-dimensional Markov chain that represents the input voice state and the buffer occupancies of voice and data. With these models, the numerical results for the performance have been obtained by the Gauss-Seidel iteration method. The analytical results have been verified by computer simylation. From the results we have found that there exist tradeoffs among the number of voice users, output link capacity, voic queue size and overflow probability for the voice traffic, and also exist tradeoffs among traffic load, data queue size and oveflow probability for the data traffic. Also, there exists a tradeoff between the performance of voice and data traffics for given inpur traffics and link capacity. In addition, it has been found that the average queueing delay of data traffic is longer than the maximum buffer size, when the gain of time assignment speech interpolation(TASI) is more than two and the number of voice users is small.

  • PDF

A Study on The Excessive Liver-Symptoms(肝實證) in The Analysis of Five Visceral Symptoms By The Five Pathogenic Factors(五邪) (오장변증중(五臟辨證中) 간실증(肝實證)의 오사(五邪)에 의한 연구)

  • Kim, Jae-Hong;Kim, Tae-Hee
    • The Journal of Internal Korean Medicine
    • /
    • v.15 no.1
    • /
    • pp.176-209
    • /
    • 1994
  • 1. The Jung-Sa(正邪) of the Excessive Liver-Symptoms belongs to the eleven symptoms, there are blue face, blue thin fingernail, anger, fancy of larg body, dizziness, eye flame, Bell's palsy, hard swelling pain at braest, side pain going on the belly from the side, side pain and movement at the left side. 2. The Mi-Sa(微邪) of the Excessive Liver-Symptoms belongs to the four symptoms, there are meat in eye, edema in cheek, lack of appetite and diarrhea. 3. The Juk-Sa(賊邪) of the Excessive Liver-Symptoms belongs to the only one symptom, this is nosebleeding. 4. The Hu-Sa(虛邪) of the Excessive Liver-Symptoms belongs to the three symptoms, there are scrotum constraction, strain in belly and constipation. 5. The Sil-Sa(實邪) of the Excessive Liver-Symptoms belongs to the twenty eight symptoms, there are red eye, raised eyes(兩眼上?), spitting blood, sternocostal turgid pain, turgidity in belly, drooping testis, vomiting water acid, sickening, belching, confusion, impatience, frequent forgetfulness, headache, giddness, eye pain, deaf, ringing in the ear, feeling inverse, drying mouth, stuffiness sensation in the chest, chest pain, stuffiness sensation in the belly, bellyache, quadriplegia, spasm of extremities, tremor, alternate spells of fever and chills, high fever and strain in muscle. 6. Those symptoms, Red corner of the eye, red face, swelling on the forehead, stiff-neck and back strong, opisthotonos, constracture of the limbs, vomiting yellow bitter water, speech impediment, epilepsy, depression, strong tongue, different thing in throat, fullness and distention of the gastric region, feeling sick and tenesmus, have no connected with the Excessive Liver-Symptoms(肝實證) 7. The Excessive Liver-Symptoms(肝實證) is connected with the ganjabyoung(肝自病) and Hwa(火) which the pathology is, than because Mock(木) is excessive and Mock-Saeng-Hwa(木生火), the ganjabyoung(肝自病) and Sil-Sa(實邪) are many. 8. There are the sixteen symptoms with the exception of The Excessive Liver-Symptoms(肝實證), because supposed that the scholars in medicine included the union syndroms(合病), the combine syndroms(兼病) and the analysis of symptoms(辨證) in The Analysis of Five Visceral Symptoms. 9. During consideration of the symptoms at the above statements, where are many causes by Gan-Pung(肝風), there is difficult of distinction between the excessive Liver-Symptoms(肝實證) and C.V.A(Cerebral Vascular Attack). Because than NaeKyung(內經) distinguished between the excessive Liver-Symptoms(肝實證) and C.V.A., the future medical specialists connected with the excessive Liver-Symptoms(肝實證) and C.V.A.. 10. An appearance of Sang-Hwa(相火) that the liver possessed is divided into an appearance of Hwa(火), there will be making a study att the more necessary. 11. The cuases of each syndroms are consist of the origins of syndroms, its pathology and the positions where the syndroms appeared, I consider that is the various ways how judge the syndroms except the Five Pathogenic Factors(五邪). 12. If more than study will be achieved in all, the new definition will be standed about the Excessive and Deficient Five Visceral Syndroms(五臟虛實證), I consider this will be the foundation data that study the Oriental Medicine and the important data that is a judgement standard of clininc.

  • PDF

A Study of Changes in Consumption Values Shown in Women's Magazines - Focus on Advertisement Content in Women's Magazines from 1955 to 2008 - (여성잡지광고에 나타난 소비가치의 변화와 광고소구방법 및 문장표현방법 분석연구 - 1955~2008년 여성잡지광고내용 분석을 중심으로 -)

  • Ko, Eun-Ju;Do, Hyun-Ji;Kim, Seon-Sook
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.34 no.2
    • /
    • pp.226-241
    • /
    • 2010
  • This study details the history and characteristics of consumption values, text style analyses, and appeal types expressed in magazine commercials from 1955 to 2008. This study analyzes the level of the social structure of commercial expression for each period. Consumption values based on the categories of consumption values by Sheth (1991) were classified through a total commercials analysis. Analyses on closing types of sentences, types of sentences, and rhetorical figures were executed focusing on headline text and text style. Appealing types were composed of rational, emotional, and ethical appeals. For analysis, the crosstab analysis and chi-square test of SPSS are used. The results are as follow. Seven values were constructed, functional value, social value, emotional value, conditional value, epistemic value, fashionable value, and indistinct value. The ratio of emotional value was the highest and functional value, epistemic value conditional value, fashionable value, social value, and indistinct value followed. The emotional value social value, conditional value, fashionable value, and epistemic value that focused on the emotion of consumers increased, while the functional value decreased. Sentences that use narrative styles, hyperboles, and metaphors that increased the interest of readers were dominantly used in the headline texts. For sentence expression, a declarative sentence in a sentence type, exciting curiosity in the expression method where hyperbole and figures of speech in rhetorical expressions are used most often. Emotional appeal was used almost twice more than the reasonable appeal for appeal types of the total commercial. The lower level of reasonable appeal is information that provides the product function. Interest and expression (such as pleasure and achievement) were used most often for emotional appeal. These results show that the most important issue is the emotional value in consumption in understanding the consumer. Marketing managers should also be aware of the functional value as well as an emotional value.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.

Acoustic Analysis and Auditory-Perceptual Assessment for Diagnosis of Functional Dysphonia (기능성 음성장애의 진단을 위한 음향학적, 청지각적 평가)

  • Kim, Geun-Hyo;Lee, Yeon-Yoo;Bae, In-Ho;Lee, Jae-Seok;Lee, Chang-Yoon;Park, Hee-June;Lee, Byung-Joo;Kwon, Soon-Bok
    • Journal of Clinical Otolaryngology Head and Neck Surgery
    • /
    • v.29 no.2
    • /
    • pp.212-222
    • /
    • 2018
  • Background and Objectives : The purpose of this study was to compare the measured values of acoustic and auditory perceptual assessments between normal and functional dysphonia (FD) groups. Materials and Methods : 102 subjects with FD and 59 normal voice groups were participated in this study. Mid-vowel portion of the sustained vowel /a/ and two sentences of 'Sanchaek' were edited, concatenated, and analyzed by Praat script. And then auditory-perceptual (AP) rating was completed by three listeners. Results : The FD group showed higher acoustic voice quality index version 2.02 and version 3.01 (AVQIv2 and AVQIv3), slope, Hammarberg index (HAM), grade (G) and overall severity (OS), values than normal group. Additionally, smoothed cepstral peak prominence in Praat (PraatCPPS), tilt, low-to high spectral band energies (L/H ratio), long-term average spectrum (LTAS) in FD group were lower than normal voice group. And the correlation among measured values ranged from -0.250 to 0.960. In ROC curve analysis, cutoff values of AVQIv2, AVQIv3, PraatCPPS, slope, tilt, L/H ratio, HAM, and LTAS were 3.270, 2.013, 13.838, -22.286, -9.754, 369.043, 27.912, and 34.523, respectively, and the AUC of each analysis was over .890 in AVQIv2, AVQIv3, and PraatCPPS, over 0.731 in HAM, tilt, and slope, over 0.605 in LTAS and L/H ratio. Conclusions : In conclusion, AVQI and CPPS showed the highest predictive power for distinguishing between normal and FD groups. Acoustic analyses and AP rating as noninvasive examination can reinforce the screening capability of FD and help to establish efficient diagnosis and treatment process plan for FD.

Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)

  • Lee, Changjae;Ra, Dongyul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.4
    • /
    • pp.169-178
    • /
    • 2022
  • Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.

Development of a Web-based Presentation Attitude Correction Program Centered on Analyzing Facial Features of Videos through Coordinate Calculation (좌표계산을 통해 동영상의 안면 특징점 분석을 중심으로 한 웹 기반 발표 태도 교정 프로그램 개발)

  • Kwon, Kihyeon;An, Suho;Park, Chan Jung
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.2
    • /
    • pp.10-21
    • /
    • 2022
  • In order to improve formal presentation attitudes such as presentation of job interviews and presentation of project results at the company, there are few automated methods other than observation by colleagues or professors. In previous studies, it was reported that the speaker's stable speech and gaze processing affect the delivery power in the presentation. Also, there are studies that show that proper feedback on one's presentation has the effect of increasing the presenter's ability to present. In this paper, considering the positive aspects of correction, we developed a program that intelligently corrects the wrong presentation habits and attitudes of college students through facial analysis of videos and analyzed the proposed program's performance. The proposed program was developed through web-based verification of the use of redundant words and facial recognition and textualization of the presentation contents. To this end, an artificial intelligence model for classification was developed, and after extracting the video object, facial feature points were recognized based on the coordinates. Then, using 4000 facial data, the performance of the algorithm in this paper was compared and analyzed with the case of facial recognition using a Teachable Machine. Use the program to help presenters by correcting their presentation attitude.

Analysis of Generative AI Technology Trends Based on Patent Data (특허 데이터 기반 생성형 AI 기술 동향 분석)

  • Seongmu Ryu;Taewon Song;Minjeong Lee;Yoonju Choi;Soonuk Seol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This paper analyzes the trends in generative AI technology based on patent application documents. To achieve this, we selected 5,433 generative AI-related patents filed in South Korea, the United States, and Europe from 2003 to 2023, and analyzed the data by country, technology category, year, and applicant, presenting it visually to find insights and understand the flow of technology. The analysis shows that patents in the image category account for 36.9%, the largest share, with a continuous increase in filings, while filings in the text/document and music/speech categories have either decreased or remained stable since 2019. Although the company with the highest number of filings is a South Korean company, four out of the top five filers are U.S. companies, and all companies have filed the majority of their patents in the U.S., indicating that generative AI is growing and competing centered around the U.S. market. The findings of this paper are expected to be useful for future research and development in generative AI, as well as for formulating strategies for acquiring intellectual property.

Acoustic Analysis and Melodization of Korean Intonation for Language Rehabilitation (언어재활을 위한 한국어의 음향적 분석과 선율화)

  • Choi, Jin Hee;Park Jeong Mi
    • Journal of Music and Human Behavior
    • /
    • v.21 no.1
    • /
    • pp.49-68
    • /
    • 2024
  • This study aims to acoustically analyze Korean language characteristics and convert these findings into musical elements, providing foundational data for evidence-based music-language rehabilitation. We collected voice data from thirty men and thirty women aged 19-25, each providing six-syllable prosodic units composed of two accentual phrases, including both declarative and interrogative sentences. Analyzing this data with Praat, we extracted syllabic acoustic properties and conducted statistical analyses based on acoustic properties, sentence type, gender, and particle presence. Significant differences were found in syllable frequency and duration based on accentual phrases and prosodic units (p < .001), with interrogative showing higher frequencies and declaratives longer durations (p < .001). Female frequencies were significantly higher than males' (p < .001), with longer durations observed (p < .001). Particle syllables also showed significantly stronger intensities (p < .001). Finally, we presented melodies converted from these acoustic properties into musical scores based on pitch, duration, and accent. The insights from this analysis of six-syllable Korean sentences will guide further research on developing a system for melodizing large-scale Korean speech data, expected to be crucial in music-based language rehabilitation.