• Title/Summary/Keyword: speaker

Search Result 1,679, Processing Time 0.028 seconds

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.

Correlation between overt and covert characteristics of stuttering in adults who stutter (말더듬의 외현적 특성과 내면적 특성 간의 상관: 말더듬 성인을 중심으로)

  • HeeCheong Chon
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.35-43
    • /
    • 2023
  • This study aimed to investigate the relationship between overt and covert characteristics of stuttering. This study included 10 adult participants who stutter. To analyze the overt characteristics, stuttering frequency, duration of stuttering moments, concomitant behaviors, and total score were scored based on the Stuttering Severity Instrument-Fourth Edition (SSI-4). Additionally, the modified Erickson scale of communication attitudes (S-24) and the Overall Assessment of the Speaker's Experience of Stuttering for Adults (OASES-A; general information, reactions to stuttering, communication in daily situations, quality of life, and total score) were used to determine the covert characteristics. Correlation analyses showed no significant association between the overt and covert variables. However, there were significant correlations between the scores on the S-24 and the OASES-A. These findings support the perspective that the overt characteristics of stuttering do not predict the covert characteristics, and vice versa. Therefore, when evaluating and intervening with adults who stutter, it is important to consider these characteristics separately.

Speech/Music Discrimination Using Spectrum Analysis and Neural Network (스펙트럼 분석과 신경망을 이용한 음성/음악 분류)

  • Keum, Ji-Soo;Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.207-213
    • /
    • 2007
  • In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

High Quality Multi-Channel Audio System for Karaoke Using DSP (DSP를 이용한 가라오케용 고음질 멀티채널 오디오 시스템)

  • Kim, Tae-Hoon;Park, Yang-Su;Shin, Kyung-Chul;Park, Jong-In;Moon, Tae-Jung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.1
    • /
    • pp.1-9
    • /
    • 2009
  • This paper deals with the realization of multi-channel live karaoke. In this study, 6-channel MP3 decoding and tempo/key scaling was operated in real time by using the TMS320C6713 DSP, which is 32 bit floating-point DSP made by TI Co. The 6 channel consists of front L/R instrument, rear L/R instrument, melody, and woofer. In case of the 4 channel, rear L/R instrument can be replaced with drum L/R channel. And the final output data is generated as adjusted to a 5.1 channel speaker. The SOLA algorithm was applied for tempo scaling, and key scaling was done with interpolation and decimation in the time domain. Drum channel was excluded in key scaling by separating instruments into drums and non-drums, and in processing SOLA, high-quality tempo scaling was made possible by differentiating SOLA frame size, which was optimized for real-time process. The use of 6 channels allows the composition of various channels, and the multi-channel audio system of this study can be effectively applied at any place where live music is needed.

Summarization of Korean Dialogues through Dialogue Restructuring (대화문 재구조화를 통한 한국어 대화문 요약)

  • Eun Hee Kim;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.77-85
    • /
    • 2023
  • After COVID-19, communication through online platforms has increased, leading to an accumulation of massive amounts of conversational text data. With the growing importance of summarizing this text data to extract meaningful information, there has been active research on deep learning-based abstractive summarization. However, conversational data, compared to structured texts like news articles, often contains missing or transformed information, necessitating consideration from multiple perspectives due to its unique characteristics. In particular, vocabulary omissions and unrelated expressions in the conversation can hinder effective summarization. Therefore, in this study, we restructured by considering the characteristics of Korean conversational data, fine-tuning a pre-trained text summarization model based on KoBART, and improved conversation data summary perfomance through a refining operation to remove redundant elements from the summary. By restructuring the sentences based on the order of utterances and extracting a central speaker, we combined methods to restructure the conversation around them. As a result, there was about a 4 point improvement in the Rouge-1 score. This study has demonstrated the significance of our conversation restructuring approach, which considers the characteristics of dialogue, in enhancing Korean conversation summarization performance.

Considerations for Helping Korean Students Write Better Technical Papers in English (한국 대학생들의 영어 기술 논문 작성 능력 향상을 위한 고찰)

  • Kim, Yee-Jin;Pak, Bo-Young;Lee, Chang-Ha;Kim, Moon-Kyum
    • Journal of Engineering Education Research
    • /
    • v.10 no.3
    • /
    • pp.64-78
    • /
    • 2007
  • For Korean researchers, English is essential. In fact, this is the case for any researcher who is a non-native English speaker, as recognition and success is predicated on being published, while publications that reach the broadest audiences are in English. Unfortunately, university science and engineering programs in Korea often do not provide formal coursework to help students attain greater competence in English composition. Aggravating this situation is the general lack of literature covering this specific pedagogical issue. While there is plenty of information to help native speakers with technical writing and much covering general English composition for EFL learners, there is very little information available to help EFL learners become better technical writers. Thus, the purpose of this report is twofold. First, as most Korean educators in science and engineering are not well acquainted with pedagogical issues of EFL writing, this report provides a general introduction to some relevant issues. It reviews the importance of contrastive rhetoric as well as some considerations for choosing the appropriate teaching approach, class arrangement, and use of computer assisted learning tools. Secondly, a course proposal is discussed. Based on a review of student writing samples as well as student responses to a self-assessment questionnaire, the proposed course is intended to balance the needs of Korean EFL learners to develop grammar, process, and genre skills involved in technical writing. Although, the scope of this report is very modest, by sharing the considerations made towards the development of an EFL technical writing course it seeks to provide a small example to a field that is perhaps lacking examples.

The Characteristics and Significance of 'Nim' Texts in the Late Chason Period: Focused on Saseol-sijo and Chap-ga (조선후기 '님' 담론의 특성과 그 의미 : 사설시조와 잡가를 중심으로)

  • Shin Eun-Kyung
    • Sijohaknonchong
    • /
    • v.20
    • /
    • pp.113-139
    • /
    • 2004
  • This article intends to illuminate how the men. leading agents in Saseol-sijo - musical performers. writers of lyrics, patrons. composers. compilers of Sijo anthologies, audience. etc. - In the Late Choson period, viewed or recognized women and how their understanding of women was reflected in the texts. Working with texts with the theme of 'Love,' this article starts with categorizing two types of love: the first type, 'lovelorn heart' focusing on unilateral pining for a single lover who is absent now and the second type. 'physical love' concentrating on bilateral sexual intercourse. In addition to the types of love, the gender of poetic speakers, distinct from real poets is vital to characterize the discourse of love. According to these two factors. texts in question fall into four groups: texts that a female speaker displays her lovelorn heart('Type 1'), those where she speaks about her sexual experiences('Type 2'), those where a male speaker sings his lovelorn heart('Type 3'), and those where he describes his sexual experiences('Type 4'). Of these. 'Type 2' and 'Type 3' are key to understanding of the men's view of women. With respect to the configuration of the theme of 'Love,' it should be noted that in Korean literary history, the nim or a 'sweetheart' had signified the totality of value or a perfect entity which makes one's life meaningful and that 'Type 1,' the pattern that a female subject expresses her love toward male min, had constituted a traditional way to convey the theme of 'Love.' In terms of this connotation of min. a remarkable increase of 'Type 3' implying the increase of male speakers, reveals the extent to which women, the male speakers' min, accomplished their entry into a 'sacred area' -the position of mm-in which only men had occupied; females are focused and centralized. This article considers this phenomenon as an exhibition of the upgrade of women's significance and weight in the Late Choson society and as an index of 'modernity.' Meanwhile, given that most of the Saseol-sijo poets are men, the emergence of the 'Type 2' texts in which male poets have female speakers disclose their sexual experiences, demonstrates a representative example that women are degraded to be a means of men's pleasure; for this situation gives men more pleasure than when male speakers reveal their sexual experiences. Not only 'Type 2,' but texts group which basically belongs to 'Type I' and conveys the theme of 'Loyalty' through the female voice by substituting rulers-subjects relation for men-women relation, also falls under the same case. For men employ female voice as a poetic device in order to stress the theme of 'Loyalty' This article regards this phenomenon as an index of 'pre-modernity,' in the sense that in a pre-modem society, specifically in Early Choson, male-oriented value system dominates, thereby alienating women. As it is well known, the Late Choson is marked by a transitional period from a pre-modem society to a modem society. Therefore the ambivalence of the premodern and the modem can be found mixed in every segment of the society. The dual aspects of the masculine view of women in Saseol-sijo constitutes one example. The significance of the Saseol-sijo in Korean literary history can be found in this phenomenon.

  • PDF

A Study on stylistic features between the manuscript edition and the woodblock ediction of 『Cheonuisogameonhae』 (『천의소감언해(闡義昭鑑諺解)』 목판본과 필사본 간의 문체론적 특징 고찰)

  • Jeong, Yun Ja;Kim, Gil Dong
    • (The)Study of the Eastern Classic
    • /
    • no.71
    • /
    • pp.231-258
    • /
    • 2018
  • This paper examines the differences of two different versions of "Cheonuisogameonhae" in terms of stylistics and investigates factors affecting the differences. The interpretations between the woodblock edition and the manuscript edition might be different depending on assumed range of readership, and the stylistic differences between two editions might be different depending on the possibility of extension of the reading population. Thus, this paper examines how stylistic effects are reflected in inter-relations between a translator as a speaker and readers as listeners according to speaker intentions. In Chapter 2, the stylistic differences reflected from two difference editions are examined in terms of the expression of a writer's respect, emotions, and formal consciousness to readers. The expressions of a writer's respect are more clearly emerged in the manuscript edition than in the woodblock edition. The honorific expression of a subject, '-gyeo?dsyeo', and the honorific expression of a writer, '-s?p-', are more frequently used in the manuscript edition than in the woodblock edition. In order to express positive emotions, exclamation endings are used in the manuscript edition, which shows the writer's strong emotional sympathy with readers' words and behaviors. On the other hand, in the woodblock edition, '-이' is used after names in order to treat rebellious subjects and people involved in conspiracy contemptuously by the use of informal forms. In addition, affirmative sentences in the manuscript edition and double negative sentences in the woodblock edition are used respectively, which intends to strongly emphasize a king's will and the appropriateness of the will. The writer's formal consciousness to readers are found in the way of writing names of people and places in Korean. Chinese characters are generally used two show formal consciousness; thus, names of people and places are expressed in Chinese characters in the woodblock edition. In Chapter 3, factors that made the stylistic differences between two editions are examined. The factors causing stylistic differences are examined in terms of the purpose of the interpretation, the class and range of the reading population, a writer's attitudes toward readers, and the face-to-fact situation of a writer and readers.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.