• Title/Summary/Keyword: speech features

Search Result 648, Processing Time 0.02 seconds

High Speed Korean Dependency Analysis Using Cascaded Chunking (다단계 구단위화를 이용한 고속 한국어 의존구조 분석)

  • Oh, Jin-Young;Cha, Jeong-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.1
    • /
    • pp.103-111
    • /
    • 2010
  • Syntactic analysis is an important step in natural language processing. However, we cannot use the syntactic analyzer in Korean for low performance and without robustness. We propose new robust, high speed and high performance Korean syntactic analyzer using CRFs. We treat a parsing problem as a labeling problem. We use a cascaded chunking for Korean parsing. We label syntactic information to each Eojeol at each step using CRFs. CRFs use part-of-speech tag and Eojeol syntactic tag features. Our experimental results using 10-fold cross validation show significant improvement in the robustness, speed and performance of long Korea sentences.

Generative Interactive Psychotherapy Expert (GIPE) Bot

  • Ayesheh Ahrari Khalaf;Aisha Hassan Abdalla Hashim;Akeem Olowolayemo;Rashidah Funke Olanrewaju
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.15-24
    • /
    • 2023
  • One of the objectives and aspirations of scientists and engineers ever since the development of computers has been to interact naturally with machines. Hence features of artificial intelligence (AI) like natural language processing and natural language generation were developed. The field of AI that is thought to be expanding the fastest is interactive conversational systems. Numerous businesses have created various Virtual Personal Assistants (VPAs) using these technologies, including Apple's Siri, Amazon's Alexa, and Google Assistant, among others. Even though many chatbots have been introduced through the years to diagnose or treat psychological disorders, we are yet to have a user-friendly chatbot available. A smart generative cognitive behavioral therapy with spoken dialogue systems support was then developed using a model Persona Perception (P2) bot with Generative Pre-trained Transformer-2 (GPT-2). The model was then implemented using modern technologies in VPAs like voice recognition, Natural Language Understanding (NLU), and text-to-speech. This system is a magnificent device to help with voice-based systems because it can have therapeutic discussions with the users utilizing text and vocal interactive user experience.

A Study on the Syntagma & Paradigm by Repetition, Variation and Contrast in Ads

  • Choi, Seong-hoon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.1-12
    • /
    • 2017
  • This study is the academic work to explore the potential meanings of print advertisements. Linguistic features such as repetition, variation, contrast and phonological structure in the verbal texts of ads can give rise to shades-of-meaning or slight variations in advertising. The language of advertising is not only language in words. It is also a language in images, colors, and pictures. Pictures and words combine to form the advertisement's visual text.. While the words are very important in delivering the sales message, the visual text cannot be ignored in advertisements. Forming part of the visual text is the paralanguage of the ad. Paralanguage is the meaningful behaviour accompanying language, such as voice quality, gestures, facial expressions and touch in speech, and choice of typeface and letter sizes in writing. Foregrounding is the throwing into relief of the linguistic sign against the background of the norms of ordinary language. This paper focuses its discussion on the advertisements within the framework of the paradigmatic and the syntagmatic relationship. The sources of ads have been confined to Malboro. The ads were reselected based on purposive sampling methods.

Improved Character-Based Neural Network for POS Tagging on Morphologically Rich Languages

  • Samat Ali;Alim Murat
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.355-369
    • /
    • 2023
  • Since the widespread adoption of deep-learning and related distributed representation, there have been substantial advancements in part-of-speech (POS) tagging for many languages. When training word representations, morphology and shape are typically ignored, as these representations rely primarily on collecting syntactic and semantic aspects of words. However, for tasks like POS tagging, notably in morphologically rich and resource-limited language environments, the intra-word information is essential. In this study, we introduce a deep neural network (DNN) for POS tagging that learns character-level word representations and combines them with general word representations. Using the proposed approach and omitting hand-crafted features, we achieve 90.47%, 80.16%, and 79.32% accuracy on our own dataset for three morphologically rich languages: Uyghur, Uzbek, and Kyrgyz. The experimental results reveal that the presented character-based strategy greatly improves POS tagging performance for several morphologically rich languages (MRL) where character information is significant. Furthermore, when compared to the previously reported state-of-the-art POS tagging results for Turkish on the METU Turkish Treebank dataset, the proposed approach improved on the prior work slightly. As a result, the experimental results indicate that character-based representations outperform word-level representations for MRL performance. Our technique is also robust towards the-out-of-vocabulary issues and performs better on manually edited text.

Creation of a Voice Recognition-Based English Aided Learning Platform

  • Hui Xu
    • Journal of Information Processing Systems
    • /
    • v.20 no.4
    • /
    • pp.491-500
    • /
    • 2024
  • In hopes of resolving the issue of poor quality of information input for teaching spoken English online, the study creates an English teaching assistance model based on a recognition algorithm named dynamic time warping (DTW) and relies on automated voice recognition technology. In hopes of improving the algorithm's efficiency, the study modifies the speech signal's time-domain properties during the pre-processing stage and enhances the algorithm's performance in terms of computational effort and storage space. Finally, a simulation experiment is employed to evaluate the model application's efficacy. The study's revised DTW model, which achieves recognition rates of above 95% for all phonetic symbols and tops the list for cloudy consonant recognition with rates of 98.5%, 98.8%, and 98.7% throughout the three tests, respectively, is demonstrated by the study's findings. The enhanced model for DTW voice recognition also presents higher efficiency and requires less time for training and testing. The DTW model's KS value, which is the highest among the models analyzed in the KS value analysis, is 0.63. Among the comparative models, the model also presents the lowest curve position for both test functions. This shows that the upgraded DTW model features superior voice recognition capabilities, which could significantly improve online English education and lead to better teaching outcomes.

A new feature specification for vowel height (모음 높이의 새로운 표기법에 대하여)

  • Park Cheon-Bae
    • MALSORI
    • /
    • no.27_28
    • /
    • pp.27-56
    • /
    • 1994
  • Processes involving the change of vowel height are natural enough to be found in many languages. It is essential to have a better feature specification for vowel height to grasp these processes properly, Standard Phonology adopts the binary feature system, and vowel height is represented by the two features, i.e., [\pm high] and [\pm low]. This has its own merits. But it is defective because it is misleading when we count the number of features used in a rule to compare the naturalness of rules. This feature system also cannot represent more than three degrees of height, We wi31 discard the binary features for vowel height. We consider to adopt the multivalued feature [n high] for the property of height. However, this feature cannot avoid the arbitrariness resulting from the number values denoting vowel height. It is not easy to expect whether the number in question is the largest or not It also is impossible to decide whether a larger number denotes a higher vowel or a lower vowel. Furthermore this feature specification requires an ad hoc condition such as n > 3 or n \geq 2, whenever we want to refer to a natural class including more than one degree of height The altelnative might be Particle Phonology, or Dependency Phonology. These might be apt for multivalued vowel height systems, as their supporters argue. However, the feature specification of Particle Phonology will be discarded because it does not observe strictly the assumption that the number of the particle a is decisive in representing the height. One a in a representation can denote variant degrees of height such as [e], [I], [a], [a ] and [e ]. This also means that we cannot represent natural classes in terms of the number of the particle a, Dependency Phonology also has problems in specifying a degree of vowel height by the dependency relations between the elements. There is no unique element to represent vowel height since every property has to be defined in terms of the dependency relations between two or more elements, As a result it is difficult to formulate a rule for vowel height change, especially when the phenomenon involves a chain of vowel shifts. Therefore, we suggest a new feature specification for vowel height (see Chapter 3). This specification resorts to a single feature H and a few >'s which refer exclusively to the degree of the tongue height when a vowel is pronounced. It can cope with more than three degrees of height because it is fundamentally a multivalued scalar feature. This feature also obviates the ad hoc condition for a natural class while the [n high] type of multivalued feature suffers from it. Also this feature specification conforms to our expection that the notation should become simpler as the generality of the class increases, in that the fewer angled brackets are used, the more vowels are included, Incidentally, it has also to be noted that, by adopting a single feature for vowel height, it is possible to formulate a simpler version of rules involving the changes of vowel height especially when they involve vowel shifts found in many languages.

  • PDF

Speech Recognition Using Noise Robust Features and Spectral Subtraction (잡음에 강한 특징 벡터 및 스펙트럼 차감법을 이용한 음성 인식)

  • Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee;Seo, Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.38-43
    • /
    • 1996
  • This paper compares the recognition performances of feature vectors known to be robust to the environmental noise. And, the speech subtraction technique is combined with the noise robust feature to get more performance enhancement. The experiments using SMC(Short time Modified Coherence) analysis, root cepstral analysis, LDA(Linear Discriminant Analysis), PLP(Perceptual Linear Prediction), RASTA(RelAtive SpecTrAl) processing are carried out. An isolated word recognition system is composed using semi-continuous HMM. Noisy environment experiments usign two types of noises:exhibition hall, computer room are carried out at 0, 10, 20dB SNRs. The experimental result shows that SMC and root based mel cepstrum(root_mel cepstrum) show 9.86% and 12.68% recognition enhancement at 10dB in compare to the LPCC(Linear Prediction Cepstral Coefficient). And when combined with spectral subtraction, mel cepstrum and root_mel cepstrum show 16.7% and 8.4% enhanced recognition rate of 94.91% and 94.28% at 10dB.

  • PDF

Characteristics of respiration and phonation depending on smoking or non smoking by practical musicology students and general male students (실용음악전공학생과 일반남학생의 흡연여부에 따른 호흡과 발성 특성 비교)

  • Kim, Eunhye;Choi, Hong-Shik;Lim, Seong-Eun;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.49-56
    • /
    • 2014
  • This research compared the features of respiration and phonation between practical musicology students and general male students, according to their smoking status. Participants of this research are 15 practical musicology male students attending ${\bigcirc}{\bigcirc}$ university and 16 general ${\bigcirc}{\bigcirc}{\bigcirc}$ university students. The participants, both non-smokers and smokers with 5-years of smoking history have no history of voice disease in any case and have normal cognitive functions. The results indicated that, first, there is not a notable difference in the respiratory activity status(FVC, FEV1, FEV1/FVC), regardless of major and smoking status. In MPT, even though there is no significant difference in accordance with their majors, considering smoking status, the smoker group was shorter than non-smoker group significant difference statistically (p<.01). Second, the divisions of participants' major did not show significant difference in Fo, jitter, shimmer, and NHR in the vowel prolongation task. However, the smoker group showed a significantly higher degree of jitter and shimmer than the non-smoker group (p<.05) as Fo and NHR shows no difference. In the case of VRP, maximum frequency and frequency range of the practical group are significantly higher than normal group statistically (p<.001). Moreover, although the difference of the minimum frequency shown at the statistic is not significant, practical group showed a higher tendency of frequency than normal group (p=.051). In conclusion, even though there is no difference in respiratory activity between the smoker group and non-smoker group, the MPT of the smoker group is shorter than that of non-smoker group. In addition, the smoker group showed a higher degree of jitter and shimmer than the non-smoker group. MPT is related to the valve action of vocal fold that passes through the glottis. Thus, it is interpreted that the smoker group has a lower quality of voice and valve action of the vocal fold. Also, the practical group has a higher degree of maximum frequency and frequency range than the normal group. This research can function as basic data for vocal characteristics for the majors in relation to the voice-specializing.

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

Metrical Structure Change Phenomenon of K-Pop Songs : Focusing on Dance Music (K-Pop 노랫말의 운율구조 변화 현상 : 댄스음악을 중심으로)

  • Seo, Keun-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.7
    • /
    • pp.343-362
    • /
    • 2020
  • English is a stress-timed language that has a phonetic system in which the speech is restructured by stress changes. On the other hand, Korean is a syllable-timed language in which each syllable is pronounced at almost the same length and intensity, and Korean and English have distinctly different metrical systems in general speech. However, as the language of the lyrics in K-Pop music is mixed in both languages, Korean and English, the Korean lyrics in K-Pop music have a metrical system by stress changes as in English. The writer's view is that the change in the metrical structure of Korean lyrics is inevitable in order to sustain the new Korean Wave. Therefore, in this study, dance music - a major genre of K-Pop music that focuses on rhythm expression - is classified into 1998, 2003, and 2009 according to the changes in the Korean Wave, and the metrical structure of each period is compared and analyzed. Based on this, the current K-Pop metrical structure features are derived and the K-Pop Korean writing method is proposed that deviates from the existing limited writing method which allocates one syllable per note. The author hopes this research will be used as a methodology for writing lyrics in Korean songs in K-Pop, as well as a way to encourage the use of Korean lyrics.