통합 검색 | Korea Science

Foreign Accents Classification of English and Urdu Languages, Design of Related Voice Data Base and A Proposed MLP based Speaker Verification System

Muhammad Ismail;Shahzad Ahmed Memon;Lachhman Das Dhomeja;Shahid Munir Shah
- International Journal of Computer Science & Network Security
- /
- 제24권10호
- /
- pp.43-52
- /
- 2024
A medium scale Urdu speakers' and English speakers' database with multiple accents and dialects has been developed to use in Urdu Speaker Verification Systems, English Speaker Verification Systems, accents and dialect verification systems. Urdu is the national language of Pakistan and English is the official language. Majority of the people are non-native Urdu speakers and non-native English in all regions of Pakistan in general and Gilgit-Baltistan region in particular. In order to design Urdu and English speaker verification systems for security applications in general and telephone banking in particular, two databases has been designed one for foreign accent of Urdu and another for foreign accent of English language. For the design of databases, voice data is collected from 180 speakers from GB region of Pakistan who could speak Urdu as well as English. The speakers include both genders (males and females) with different age groups ranging from 18 to 69 years. Finally, using a subset of the data, Multilayer Perceptron based speaker verification system has been designed. The designed system achieved overall accuracy rate of 83.4091% for English dataset and 80.0454% for Urdu dataset. It shows slight differences (4.0% with English and 7.4% with Urdu) in recognition accuracy if compared with the recently proposed multilayer perceptron (MLP) based SIS achieved 87.5% recognition accuracy
https://doi.org/10.22937/IJCSNS.2024.24.10.5 인용 PDF

한$\cdot$ 영 동시조음 데이터베이스의 구축 (Speech Coarticulation Database of Korean and English)

김종미
- 한국음향학회지
- /
- 제18권3호
- /
- pp.17-26
- /
- 1999
We present the first speech coarticulation database of Korean, English and Konglish/sup 3)/ named "SORIDA"/sup 4)/, which is designed to cover the maximum number of representations of coarticulation in these languages [1]. SORIDA features a compact database which is designed to contain a maximum number of triphones in a minimum number of prompts. SORIDA contains all consonantal triphones and vowel allophones in 682 Korean prompts of word length and in 717 English prompt words, spoken five times by speakers of balanced genders, dialects and ages. Korean prompts are synthesized lexicons which maximize their coarticulation variation disregarding any stress phenomena, while English prompts are natural words that fully reflect their stress effects with respect to the coarticulation variation. The prompts are designed differently because English phonology has stress while Korean does not. An intermediate language, Konglish has also been modeled by two Korean speakers reading 717 English prompt words. Recording was done in a controlled laboratory environment with an AKG Model C-100 microphone and a Fostex D-5 digital-audio-tape (DAT) recorder. The total recording time lasted four hours. SORIDA CD-ROM is available in one disk of 22.05 kHz sampling rate with a 16 bit sample size. SORIDA digital audio-tapes are available in four 124-minute-tapes of 48 kHz sampling rate. SORIDA′s list of phonetically-rich-words is also available in English and Korean.
PDF

한국인 영어 학습자의 발음 정확성 자동 측정방법에 대한 연구 (A Study on Automatic Measurement of Pronunciation Accuracy of English Speech Produced by Korean Learners of English)

윤원희;정현성;장태엽
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2005년도 추계 학술대회 발표논문집
- /
- pp.17-20
- /
- 2005
The purpose of this project is to develop a device that can automatically measure pronunciation of English speech produced by Korean learners of English. Pronunciation proficiency will be measured largely in two areas; suprasegmental and segmental areas. In suprasegmental area, intonation and word stress will be traced and compared with those of native speakers by way of statistical methods using tilt parameters. Durations of phones are also examined to measure speakers' naturalness of their pronunciations. In doing so, statistical duration modelling from a large speech database using CART will be considered. For segmental measurement of pronunciation, acoustic probability of a phone, which is a byproduct when doing the forced alignment, will be a basis of scoring pronunciation accuracy of a phone. The final score will be a feedback to the learners to improve their pronunciation.
PDF

Annotation of a Non-native English Speech Database by Korean Speakers

Kim, Jong-Mi
- 음성과학
- /
- 제9권1호
- /
- pp.111-135
- /
- 2002
An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.
PDF

성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구 (A corpus-based study on the effects of voicing and gender on American English Fricatives)

윤태진
- 말소리와 음성과학
- /
- 제10권2호
- /
- pp.7-14
- /
- 2018
The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2018.10.2.007 인용 PDF KSCI

Constructing Japanese MeSH term dictionaries related to the COVID-19 literature

Yamaguchi, Atsuko;Takatsuki, Terue;Tateisi, Yuka;Soares, Felipe
- Genomics & Informatics
- /
- 제19권3호
- /
- pp.25.1-25.5
- /
- 2021
The coronavirus disease 2019 (COVID-19) pandemic has led to a flood of research papers and the information has been updated with considerable frequency. For society to derive benefits from this research, it is necessary to promote sharing up-to-date knowledge from these papers. However, because most research papers are written in English, it is difficult for people who are not familiar with English medical terms to obtain knowledge from them. To facilitate sharing knowledge from COVID-19 papers written in English for Japanese speakers, we tried to construct a dictionary with an open license by assigning Japanese terms to MeSH unique identifiers (UIDs) annotated to words in the texts of COVID-19 papers. Using this dictionary, 98.99% of all occurrences of MeSH terms in COVID-19 papers were covered. We also created a curated version of the dictionary and uploaded it to Pub-Dictionary for wider use in the PubAnnotation system.
https://doi.org/10.5808/gi.21012 인용 PDF KSCI

Acoustic analysis of fricatives in dysarthric speakers with cerebral palsy

Hernandez, Abner;Lee, Ho-young;Chung, Minhwa
- 말소리와 음성과학
- /
- 제11권3호
- /
- pp.23-29
- /
- 2019
This study acoustically examines the quality of fricatives produced by ten dysarthric speakers with cerebral palsy. Previous similar studies tend to focus only on sibilants, but to obtain a better understanding of how dysarthria affects fricatives we selected a range of samples with different places of articulation and voicing. The Universal Access (UA) Speech database was used to select thirteen words beginning with one of the English fricatives (/f/, /v/, /s/, /z/, /ʃ/, /ð/). The following four measurements were taken for both dysarthric and healthy speakers: phoneme duration, mean spectral peak, variance and skewness. Results show that even speakers with mild dysarthria have significantly longer fricatives and a lower mean spectral peak than healthy speakers. Furthermore, mean spectral peak and variance showed significant group effects for both healthy and dysarthric speakers. Mean spectral peak and variance was also useful for discriminating several places of articulation for both groups. Lastly, spectral measurements displayed important group differences when taking severity into account. These findings show that in general there is a degradation in the production of fricatives for dysarthric speakers, but difference will depend on the severity of dysarthria along with the type of measurement taken.
https://doi.org/10.13064/KSSS.2019.11.3.023 인용 PDF KSCI

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

Yoon, Tae-Jin
- 말소리와 음성과학
- /
- 제7권1호
- /
- pp.117-124
- /
- 2015
This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2015.7.1.117 인용 PDF KSCI

발성 평가를 위한 영어 음성인식기의 개발 (Development of English Speech Recognizer for Pronunciation Evaluation)

박전규;이준조;김영창;허용수;이석재;이종현
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 10월 학술대회지
- /
- pp.37-40
- /
- 2003
This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.
PDF

Combination of Classifiers Decisions for Multilingual Speaker Identification

Nagaraja, B.G.;Jayanna, H.S.
- Journal of Information Processing Systems
- /
- 제13권4호
- /
- pp.928-940
- /
- 2017
State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.
https://doi.org/10.3745/JIPS.02.0025 인용 PDF KSCI

검색결과 10건 처리시간 0.029초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)