Search | Korea Science

Zero-shot voice conversion with HuBERT

Hyelee Chung;Hosung Nam
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.69-74
- /
- 2023
This study introduces an innovative model for zero-shot voice conversion that utilizes the capabilities of HuBERT. Zero-shot voice conversion models can transform the speech of one speaker to mimic that of another, even when the model has not been exposed to the target speaker's voice during the training phase. Comprising five main components (HuBERT, feature encoder, flow, speaker encoder, and vocoder), the model offers remarkable performance across a range of scenarios. Notably, it excels in the challenging unseen-to-unseen voice-conversion tasks. The effectiveness of the model was assessed based on the mean opinion scores and similarity scores, reflecting high voice quality and similarity to the target speakers. This model demonstrates considerable promise for a range of real-world applications demanding high-quality voice conversion. This study sets a precedent in the exploration of HuBERT-based models for voice conversion, and presents new directions for future research in this domain. Despite its complexities, the robust performance of this model underscores the viability of HuBERT in advancing voice conversion technology, making it a significant contributor to the field.
https://doi.org/10.13064/KSSS.2023.15.3.069 인용 PDF

Anglicisms in the Field of Information Technology: Analysis of Linguistic Features

Antonina, Plechko;Tetiana, Chukhno;Tetiana, Nikolaieva;Liliia, Apolonova;Tetiana, Leleka
- International Journal of Computer Science & Network Security
- /
- v.22 no.4
- /
- pp.183-192
- /
- 2022
The role that English currently plays is undeniable. It has become the most common means of communication among native speakers of several languages around the world. English penetrates into all areas of people's daily lives. In the field of Information Technology (IT), English has taken a dominant position, as many of the terms used on a daily basis are written in English. The purpose of the article is to analyze the linguistic features of anglicisms in the field of Information Technology. Methods. The research is based on systematic and comparative analysis, dialectical method, as well as methods of classification and generalization. Results. This study presents the results of compiling a multilingual glossary with anglicisms used in the GitHub and 3D Slicer fields. Despite the limited number of terms included in the glossary, the article provides a lot of evidence for the influence of the English language in the areas of Information Technology, GitHub and 3D Slicer under consideration. The types of anglicisms used in the 3D Slicer area seem to be more diverse than in the GitHub area. This study found that five European languages use language strategies to solve any communication problem. The multilingual glossary showed that in some cases there is a coexistence between Anglicism and the native term. In other cases, the English term is the only one used in different languages. There are cases when only the native language is used. Conclusions. This study is a useful tool that helps to improve the efficiency of communication between engineers and technicians who speak different native languages. The ultimate goal of this research will be to create a multilingual glossary that is still under development and is likely to cover other IT areas such as Python and VTK.
https://doi.org/10.22937/IJCSNS.2022.22.4.23 인용 PDF KSCI

English Predicate Inversion: Towards Data-driven Learning

Kim, Jong-Bok;Kim, Jin-Young
- Journal of English Language & Literature
- /
- v.56 no.6
- /
- pp.1047-1065
- /
- 2010
English inversion constructions are not only hard for non-native speakers to learn but also difficult to teach mainly because of their intriguing grammatical and discourse properties. This paper addresses grammatical issues in learning or teaching the so-called 'predicate inversion (PI)' construction (e.g., Equally important in terms of forest depletion is the continuous logging of the forests). In particular, we chart the grammatical (distributional, syntactic, semantic, pragmatic) properties of the PI construction, and argue for adata-driven teaching for English grammar. To depart from the arm-chaired style of grammar teaching (relying on author-made simple sentences), our teaching method introduces a datadriven teaching. With total 25 university students in a grammar-related class, students together have analyzed the British Component of the International Corpus of English (ICE-GB), containing about one million words distributed across a variety of textual categories. We have identified total 290 PI sentences (206 from spoken and 87 from written texts). The preposed syntactic categories of the PI involve five main types: AdvP, PP, VP(ed/ing), NP, AP, and so, all of which function as the complement of the copula. In terms of discourse, we have observed, supporting Birner and Ward's (1998) observation that these preposed phrases represent more familiar information than the postposed subject. The corpus examples gave us the three possible types: The preposed element is discourse-old whereas the postposed one is discourse-new as in Putting wire mesh over a few bricks is a good idea. Both preposed and postposed elements can also be discourse new as in But a fly in the ointment is inflation. These two elements can also be discourse old as in Racing with him on the near-side is Rinus. The dominant occurrence of the PI in the spoken texts also supports the view that the balance (or scene-setting) in information structure is the main trigger for the use of the PI construction. After being exposed to the real data and in-depth syntactic as well as informationstructure analysis of the PI construction, it is proved that the class students have had a farmore clear understanding of the construction in question and have realized that grammar does not mean to live on by itself but tightly interacts with other important grammatical components such as information structure. The study directs us toward both a datadriven and interactive grammar teaching.

Modality in Korean Learners' Spoken Interlanguage

Park, Hyeson
- English Language & Literature Teaching
- /
- v.18 no.1
- /
- pp.197-216
- /
- 2012
This study examines spoken interlanguage of Korean learners of English, focusing on the distribution of modal verbs and devices of epistemic modality. (Semi-) spontaneous speech data were collected from four students participating in a self-organized study group for seven months, which produced a corpus of about 55,000 words. The data analysis reveals the following: 1) The frequency of the modal verbs produced by the learners was lower than that of native speakers; 1.99 vs. 2.32 tokens per 100 words. The range of the modal verbs used by the learners was also very limited, with over-reliance on can (43%). 2) The grammatical categories of the devices marking epistemic modality were in the order of adverbs, lexical verbs, and modal verbs, with a high frequency of a few items in each category. 3) Lexical items conveying certainty and modals of obligation were preferred over markers of weaker commitment, resulting in speech characterized by firmer assertions and a more authoritative tone, a potential cause for pragmatic failure. 4) A weak developmental change was observed in the frequency of modal verbs, but not in their functions over the seven month period of data collection. L1 influence, L2 proficiency, mode of communication, and instruction effects are discussed as possible variables involved in the distribution patterns observed.
PDF

An acoustical analysis of synchronous English speech using automatic intonation contour extraction (영어 동시발화의 자동 억양궤적 추출을 통한 음향 분석)

Yi, So Pae
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.97-105
- /
- 2015
This research mainly focuses on intonational characteristics of synchronous English speech. Intonation contours were extracted from 1,848 utterances produced in two different speaking modes (solo vs. synchronous) by 28 (12 women and 16 men) native speakers of English. Synchronous speech is found to be slower than solo speech. Women are found to speak slower than men. The effect size of speech rate caused by different speaking modes is greater than gender differences. However, there is no interaction between the two factors (speaking modes vs. gender differences) in terms of speech rate. Analysis of pitch point features has it that synchronous speech has smaller Pt (pitch point movement time), Pr (pitch point pitch range), Ps (pitch point slope) and Pd (pitch point distance) than solo speech. There is no interaction between the two factors (speaking modes vs. gender differences) in terms of pitch point features. Analysis of sentence level features reveals that synchronous speech has smaller Sr (sentence level pitch range), Ss (sentence slope), MaxNr (normalized maximum pitch) and MinNr (normalized minimum pitch) but greater Min (minimum pitch) and Sd (sentence duration) than solo speech. It is also shown that the higher the Mid (median pitch), the MaxNr and the MinNr in solo speaking mode, the more they are reduced in synchronous speaking mode. Max, Min and Mid show greater speaker discriminability than other features.
https://doi.org/10.13064/KSSS.2015.7.1.097 인용 PDF KSCI

An acoustical analysis of speech of different speaking rates and genders using intonation curve stylization of English (영어의 억양 유형화를 이용한 발화 속도와 남녀 화자에 따른 음향 분석)

Yi, So Pae
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.79-90
- /
- 2014
An intonation curve stylization was used for an acoustical analysis of English speech. For the analysis, acoustical feature values were extracted from 1,848 utterances produced with normal and fast speech rate by 28 (12 women and 16 men) native speakers of English. Men are found to speak faster than women at normal speech rate but no difference is found between genders at fast speech rate. Analysis of pitch point features has it that fast speech has greater Pt (pitch point movement time), Pr (pitch point pitch range), and Pd (pitch point distance) but smaller Ps (pitch point slope) than normal speech. Men show greater Pt, Pr, and Pd than women. Analysis of sentence level features reveals that fast speech has smaller Sr (sentence level pitch range), Sd (sentence duration), and Max (maximum pitch) but greater Ss (sentence slope) than normal speech. Women show greater Sr, Ss, Sp (pitch difference between the first pitch point and the last), Sd, MaxNr (normalized Max), and MinNr (normalized Min) than men. As speech rate increases, women speak with greater Ss and Sr than men.
https://doi.org/10.13064/KSSS.2014.6.4.079 인용 PDF KSCI

An Experimental Study of Vowel Epenthesis among Korean Learners of English (한국인 영어학습자의 모음삽입현상에 대한 연구)

Shin, Dong-Jin;Iverson, Paul
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.163-174
- /
- 2014
Korean L2 speakers have many problems learning the pronunciation of English words. One of these problems is vowel epenthesis. Vowel epenthesis is the insertion of vowels into or between words, and Korean learners of English typically do this between successive consonants, either within clusters, or across syllables, word boundaries or following final coda consonants. The aim of this study was to investigate whether individual differences in vowel epenthesis are more closely related to the perception and production of segments (vowels and consonants) and prosody or if they are relatively independent from these processes. Subjects completed a battery of production and perception tasks. They read sentences, identified vowels and consonants, read target words likely to have epenthetic vowels (e.g., abduction) and demonstrated stress recognition and epenthetic vowel perception. The results revealed that Korean second-language learners (L2) have problems with vowel epenthesis in production and perception, but production and perception abilities were not correlated with one another. Vowel epenthesis was strongly related to vowel production and perception, suggesting that problems with segments may be combined with L1 phonotactics to produce epenthesis.
https://doi.org/10.13064/KSSS.2014.6.2.163 인용 PDF KSCI

Web-based Cyber Instruction for EFL Learning

Cha Mi-Yang
- Journal of Digital Contents Society
- /
- v.6 no.4
- /
- pp.209-216
- /
- 2005
The aim of this study is to examine the effects of web-based cyber instruction on EFL learning from the viewpoint of learners` perceptions and needs. Data was collected through a questionnaire survey that was carried out with 709 undergraduate student enrolled in three cyber English courses offered at N university during the secind semester in 2004. The results of the study indicated that the learners exhibited a positive attitude towards web-based cyber instruction and considered it a paper educational method in the cyber age. However, the students perceived that web-based cyber instruction was not greatly satisfactory in terms of cultivating their English communicative competence or improving the language skills they needed. It was also found that cyber instruction was still teacher-dominant, lacking in interaction, which made the students passive recipients of informaton presented. In comparison with off-line instruction, cyber instruction was not particularly better in enhancing their motivation interest or concentration on class. To be more effective, cyber instruction needs to be equipped not only with a large variety of contents and class activities, but also with more exposure to authentic language by native English speakers. The finding of the investigation yield some implications for the design and development of web-based cyber EFL programs.
PDF

Pitch trajectories of English vowels produced by American men, women, and children

Yang, Byunggon
- Phonetics and Speech Sciences
- /
- v.10 no.4
- /
- pp.31-37
- /
- 2018
Pitch trajectories reflect a continuous variation of vocal fold movements over time. This study examined the pitch trajectories of English vowels produced by 139 American English speakers, statistically analyzing their trajectories using the Generalized Additive Mixed Models (GAMMs). First, Praat was used to read the sound data of Hillenbrand et al. (1995). A pitch analysis script was then prepared, and six pitch values at the corresponding time points within each vowel segment were collected and checked. The results showed that the group of men produced the lowest pitch trajectories, followed by the groups of women, boys, then girls. The density line showed a bimodal distribution. The pitch values at the six corresponding time points formed a single dip, which changed gradually across the vowel segment from 204 to 193 to 196 Hz. The normality tests performed on the pitch data rejected the null hypothesis. Nonparametric tests were therefore conducted to discover the significant differences in the values among the four groups. The GAMMs, which analyzed all the pitch data, produced significant results among the pitch values at the six corresponding time points but not between the two groups of boys and girls. The GAMMs also revealed that the two groups were significantly different only at the first and second time points. Accordingly, the methodology of this study and its findings may be applicable to future studies comparing curvilinear data sets elicited by experimental conditions.
https://doi.org/10.13064/KSSS.2018.10.4.031 인용 PDF KSCI

In My Opinion: Modality in Japanese EFL Learners' Argumentative Essays

Pemberton, Christine
- Asia Pacific Journal of Corpus Research
- /
- v.1 no.2
- /
- pp.57-72
- /
- 2020
This study seeks to add to the current understanding of learners' use of modality in argumentative writing. A learner corpus of argumentative essays on four topics was created and compared to native English speaker data from the International Corpus Network of Asian Learners of English (ICNALE). The relationship between learners' use of modal devices (MDs) and the devices' appearance in the school's curriculum was also examined. The results showed that learners relied on a very narrow range of MDs compared to those in previous studies. The frequency of use of MDs varied based on the topic and did not seem to be driven by cultural factors as has been previously suggested. Learners used more hedges than boosters on all topics, contradicting most previous studies. Curriculum was determined to have a direct correlation with MD use, and other important factors may include perception of topic and overreliance on certain MDs over others (the One-to-One principal). This research implies that learners' perception of topic should be explored further as a variable affecting MD use. Curricula should be designed based on frequency of MD use by English native speakers, and learners should receive instruction that teaches the norms of MD use in academic writing. The methodology used in the study to determine correlations between MD use and the curriculum has a wide range of potential applications in the field of Contrastive Interlanguage Analysis.
https://doi.org/10.22925/apjcr.2020.1.2.57 인용 PDF

Search Result 452, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)