• Title/Summary/Keyword: 화자의 성별

Search Result 32, Processing Time 0.026 seconds

Improving the Performance of a Speech Recognition System in a Vehicle by Distinguishing Male/Female Voice (성별 구별방법에 의한 자동차 내 음성 인식 성능 향상)

  • Yang, Jin-Woo;Kim, Sun-Hyeop
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.12
    • /
    • pp.1174-1182
    • /
    • 2000
  • 본 논문은 주행중인 자동차 환경에서 운전자의 안전성 및 편의성의 동시 확보를 위하여, 보조적인 스위치 조작 없이 상시 음성의 입, 출력이 가능한 시스템을 제안하였다. 이대 잡음에 강인한 threshold 값을 구하기 위하여, 1.5초마다 기준 에너지와 영 교차율을 변경하였으며 대역 통과 여과기를 이용하여 1차, 2차로 나누어 실시간 상태에서 자동으로, 정확하게 끝점 검출을 처리하였다. 또한 남성, 여성을 피치검출로 구분하여 모델을 선택하게 하였고, 주행중인 자동차 속도에 따라 가장 적합한 모델을 사용하기 위하여 Idle-40km, 40-80km, 80-100km로 구분하여 남성, 여성 모델을 각각 구분하여 인식할 수 있게 하였다. 그리고, 음성의 특징 벡터와 인식 알고리즘은 PLP 13차와 OSDP(one-Stage Dynamic Programming)을 사용하였다. 본 실험은 서울시내 도로 및 내부 순환도로에서 각각 속도별로 구분하여 화자독립 인식 실험을 한 결과 40-80km 상태에서 남자는 96.8%, 여자는 95.1%, 80-100km 상태에서는 남자 91.6%, 여자는 90.6%의 인식결과를 얻을 수 있었고, 화자종속 인식실험 결과 40-80km 상태에서 남자는 98%, 여자는 96%, 80-100km 상태에서는 남자는 96%, 여자는 94%의 높은 인식률을 얻었으므로, system의 유효성을 입증하였다.

  • PDF

A Study of the Giving and Receiving Verbs in TOUSEISYOUSEIKATAGI (『当世書生気質』에 나타난 수수동사에 관한 고찰 - 'やる·あげる·さしあげる'와 'くれる·くださる'를 중심으로)

  • Yang, Jung Soon
    • Cross-Cultural Studies
    • /
    • v.19
    • /
    • pp.271-293
    • /
    • 2010
  • Japanese Give and Receive Verbs are divided into "YARU", "MORAU" and "KURERU". These are influenced by the subject, speaker's viewpoint and meaning. Three verbs are used in a different way depending on who is the giver and who is the taker. I analyze "YARU" and "KURERU" Verbs used in TOUSEISYOUSEIKATAGI. It focus on politeness, gender, and meaning when combined with 'TE'. As an expression of politeness, 'Yaru' is to give to a person of lower social status or an animal or plant. 'Ageru' is to give to an equal ora person of lower social status nowadays. However, 'Ageru' which is treated as elegance of the language remained expression of respect, 'Yaru' is used when the receiver is a person of lower social status and equal social status in TOUSEISYOUSEIKATAGI. 'Kureru' is used when the receiver is a person of lower social status and equal social status, 'kudasaru' is used when a person of higher social status gives the speaker something in TOUSEISYOUSEIKATAGI. Women speakers use 'oyarinasai' 'oyariyo' 'ageru' 'okureru' and men speakers use 'yaru' 'kureru'. Speech patterns peculiar to men are 'kuretamae' 'kurenka'. If the verbs are joined to "TE", they obtain abstract meaning as well as a movement of things. They express some modality for action of the preceeding verbs. The modality has the following meanings ; good will, goodness, benefits, kindness, hopeness, expectation, disadvantage, injury, ill will and sarcasm. In addition, 'TE YARU' expresses the speaker's strong will, 'TE KURERU' expresses the speaker's request.

Changes in fundamental frequency depending on language, context, and language proficiency for bilinguals (한국어-영어 이중언어 화자의 사용 언어, 문맥, 언어 능숙도에 따른 기본 주파수 변화)

  • Yoon, Somang;Mok, Sora;Youn, Jungseon;Han, Jiyun;Yim, Dongsun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.9-18
    • /
    • 2019
  • The purpose of this study is to determine whether the mean fundamental frequency (F0) changes depending on language, task, or language proficiency for Korean-English bilinguals. A total of forty-eight Korean-English speakers (28 balanced bilinguals and 20 Korean dominant bilinguals) participated in the study. Participants were asked to read aloud two types of tasks in English and Korean. For statistical analyses, the language ${\times}$ task two-way repeated ANOVAs were conducted within the balanced bilingual group first, and then group ${\times}$ language two-way mixed ANOVAs. The results showed that the females in both bilingual groups changed their mean F0 depending on the language they used and the tasks (p<.05), whereas no significant results were found in the males in either group under any conditions. The mean fundamental frequency in the Korean reading task was significantly higher than that in the English reading task for females in both balanced and Korean dominant bilingual groups. Thus, changes in mean F0 depending on language and context may reflect gender-specific characteristics, and females seem to be more sensitive to the socio-cultural standards that are imposed on them.

Common Speech Database Collection (공통음성 DB 구축)

  • Kim Sanghum;Oh Seungshin;Jung Ho-Young;Jeong Hyung-Bae;Kim Jeong-Se
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.21-24
    • /
    • 2002
  • 본 논문은 ETRI 음성정보연구센터에서 추진하고 있는 공통음성 DB 구축에 관하여 기술한다. 총 3 년(2001 11-2004. 10) 동안 음성인식, 음성합성, 화자인식 등 다양한 용도의 음성 DB 를 수집할 예정이며, 1년차인 2002 년에는 총 14 종의 음성 DB 를 수집할 계획이다. 공통 음성 DB 는 다양한 통신망(마이크, 헤드셋, VoIP, 유무선 전화망), 지역, 성별, 발성환경(사무실, 지하철, 도로 등)을 고려하여 설계하였으며, 발성대상은 숫자, 단어, 문장이고, 발성방법은 자유발화, 대화체, 낭독체 등 다양한 스타일의 음성 DB 로 구성되어 있다. 이에 본 논문에서는 총 14 종에 해당하는 공통음성 DB 의 구축내역과 구축방안 및 DB 구축 일정에 관해 기술하고자 한다.

  • PDF

Expressions of requests using give and receive verbs in the era of Meizi and Taisyo (메이지·다이쇼 작품의 てくれ·てください의 표현 양상)

  • Yang, JungSoon
    • Cross-Cultural Studies
    • /
    • v.29
    • /
    • pp.391-411
    • /
    • 2012
  • Request expressions can be defined as expressions that demand or ask the other person to do certain movements. There are direct request expressions that ask the other person to do certain movements directly and indirect request expressions that ask the other person to do certain movements by describing the speaker's condition. The study analyzed gender and hierarchy of speakers and listeners who used 'tekure' and 'tekudasai' in dialog examples of the Meiji Period and the Taisho Period. In those periods, the modern Tokyo dialect was formed and established. "Toseishoseikatagi"in Meiji 10s,"Ukigumo""Natsukodachi""Tajotakon"in Meiji 20s,"Hakai""Botchan"in Meiji 30s,"Huton""Inakakyoshi" in Meiji 40s and "Aruonna"in the Taisho Period were analyzed for the study. 'kure' was used more by male speakers than female speakers. Examples by female speakers were shown on the novels after Meji 30s. In case of male speakers, they often used it to listeners with an equitable relationship at "Toseishoseikatagi"in Meiji 10s but they often used it to younger listeners at "Hakai"in Meiji 30s. 'okure' was used more by female speakers than male speakers. Listeners were varied from older ones to younger ones. In case of female speakers, 'okure' was used more often at "Aruonna"in the Taisho Period than the other novels. In case of male speakers, 'okure' was used only at "Ukigumo""Natsukodachi"and "Hakai". 'Okurenasai' was used outstandingly by female speakers on the form of 'okun_'. In case of 'kudasai', female speakers used it more than male speakers at "Toseishoseikatagi" and "Aruonna"but male speakers used it more than female speakers at "Tajotakon"and "Hakai". Listeners were varied from older ones to younger ones. 'o~kudasai' was not shown until Meiji 20s but shown after Meiji 30s among the analyzed novels. According to gender, it was used a little bit more often by female speakers than male speakers. According to hierarchy, listeners were usually older than speakers. 'o~nasatekudasai' was used more often by male speakers than female speakers. Listeners were also usually older than speakers.

Prediction of speaking fundamental frequency using the voice and speech range profiles in normal adults (정상 성인에서 음성 및 말소리 범위 프로파일을 이용한 발화 기본주파수 예측)

  • Lee, Seung Jin;Kim, Jaeock
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.49-55
    • /
    • 2019
  • This study sought to investigate whether mean speaking fundamental frequency (SFF) can be predicted by parameters of voice and speech range profile (VRP and SRP) in Korean normal adults. Moreover, it explored whether gender differences exist in the absolute differences between the SFF and estimated SFF (ESFF) predicted by the VRP and SRP. A total of 85 native Korean speakers with normal voice participated in the study. Each participant was asked to perform the VRP task using the vowel /a/ and the SRP task using the first sentence of a Korean standard passage "Ga-eul". In addition, the SFF was measured with electroglottography during a passage reading task. Predictive factors of the SFF were explored and the absolute difference between the SFF and the ESFF (DSFF) was compared between gender groups. Results indicated that predictive factors were age, gender, minimum pitch and pitch range for the VRP (adjusted $R^2=.931$), and pitch range (in semi-tones) and maximum pitch for the SRP (adjusted $R^2=.963$), respectively. The SFF and ESFF predicted by the VRP and SRP showed a strong positive correlation. The DSFF of the VRP and SRP, as well as their sum did not differ by gender. In conclusion, the SFF during a passage reading task could be successfully predicted by the parameters of the VRP and SRP tasks. In further studies, clinical implications need to be explored in patients who may exhibit deviations in SFF.

Analysis of unfairness of artificial intelligence-based speaker identification technology (인공지능 기반 화자 식별 기술의 불공정성 분석)

  • Shin Na Yeon;Lee Jin Min;No Hyeon;Lee Il Gu
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.27-33
    • /
    • 2023
  • Digitalization due to COVID-19 has rapidly developed artificial intelligence-based voice recognition technology. However, this technology causes unfair social problems, such as race and gender discrimination if datasets are biased against some groups, and degrades the reliability and security of artificial intelligence services. In this work, we compare and analyze accuracy-based unfairness in biased data environments using VGGNet (Visual Geometry Group Network), ResNet (Residual Neural Network), and MobileNet, which are representative CNN (Convolutional Neural Network) models of artificial intelligence. Experimental results show that ResNet34 showed the highest accuracy for women and men at 91% and 89.9%in Top1-accuracy, while ResNet18 showed the slightest accuracy difference between genders at 1.8%. The difference in accuracy between genders by model causes differences in service quality and unfair results between men and women when using the service.

The fundamental frequency (f0) distribution of Korean speakers in a dialogue corpus using Praat and R (Praat과 R로 분석한 한국인 대화 음성 말뭉치의 fundamental frequency(f0)값 분포)

  • Byunggon Yang
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.17-25
    • /
    • 2023
  • This study examines the fundamental frequency(f0) distribution of 2,740 Korean speakers in a dialogue speech corpus. Praat and R were used for the collection and analysis of acoustical f0 data after removing extreme values considering the interquartile f0 range of the intonational phrases produced by each individual speaker. Results showed that the average f0 value of all speakers was 185 Hz and the median value was 187 Hz. The f0 data showed a positively skewed distribution of 0.11, and the kurtosis was -0.09, which is close to the normal distribution. The pitch values of daily conversations varied in the range of 238 Hz. Further examination of the male and female groups showed distinct median f0 values: 114 Hz for males and 199 Hz for females. A t-test between the two groups yielded a significant difference. The skewness representing the distribution shape was 1.24 for the male group and 0.58 for the female group. The kurtosis was 5.21 and 3.88 for the male and female groups, and the male group values appeared leptokurtic. A regression analysis between the median f0 and age yielded a slope of 0.15 for the male group and -0.586 for the female group, which indicated a divergent relationship. In conclusion, a normative f0 distribution of different Korean age and sex groups can be examined in the conversational speech corpus recorded by a massive number of participants. However, more rigorous data might be required to define a relation between age and f0 values.

The Distribution and Trend of Malocclusion Patients Visited at Department of Dentistry in Orthodontics (영남대학교 의과대학 부속병원 치과교정과에 내원한 부정교합 환자의 분포 및 변동추이)

  • Kim, Jong-Sup;Park, Jin-Ho;Yun, Hong-Sik;Yim, Nan-Hee;Chin, Byung-Rho;Lee, Hee-Kyung
    • Journal of Yeungnam Medical Science
    • /
    • v.11 no.2
    • /
    • pp.323-331
    • /
    • 1994
  • 1,050 patients who visited orthodontic dental department from 1983 to 1994, were surveyed on the yearly tendency of orthodontic patient distribution and state by means of Angle's classification. The results were as follows: 1. There was increased visiting rate of patient per year and higher visiting rate in female than in male. 2. 8-15 age group was 61.4% in total visiting patients and over 20 age group was 18.5%, under 7 age group was 8.1% 3. Class I malocclusion was 42.2%, class II div 1 was 22.5%, class II-2 was 3.9%, class III was 29.1% and cleft lip & palate was 2.0% in total visiting patient. 4. As showed the living distribution, Namgu and Susunggu's patients were 43.7% of the total patients. 5. There was increased tendency for the number of the patient to be recieved orthognathic surgery.

  • PDF

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF