• Title/Summary/Keyword: Speech Texts

Search Result 44, Processing Time 0.022 seconds

A Quantitative Linguistic Study on the Functional load of Phonemes in Standard Korean (한국어 음소의 기능부담량 - 계량 언어학적 연구)

  • Jin Nam-Taek
    • MALSORI
    • /
    • no.25_26
    • /
    • pp.65-92
    • /
    • 1993
  • Not all linguistic units are of equal importance in the functioning of language. The present study aims to examine He functional load of phonemes in standard Korean, To achieve this goal, B analysed continuous texts selected from the textbooks of elementary school on a personal computer. The total number of syllables studied in this thesis is 101,637. The characteristics of the Korean syllable structures are as follows. 1) In a syllable head, /n/ occurs most frequently. 2) The frequencies of syllables with an onset are much higher than those with no onset ( 85% : 15% ), 3) In a syllable head, obstruents are preferred because their consonantal strength are great, (57%) 4) In a syllable nucleus, /a/ occurs most frequently. 5) The rate of occurrence of the monophthongs is 90.2%, and that of the diphthongs is 9.8%. Especially the three basic vowels(/i,a,u/) occur at the rate of 46.6%. 6) In a syllable coda, /n/ occurs most frequently. 7) The open syllables are favored (open syllable 68.7%, closed syllable 31.3%).

  • PDF

A Study on the Syntagma & Paradigm by Repetition, Variation and Contrast in Ads

  • Choi, Seong-hoon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.1-12
    • /
    • 2017
  • This study is the academic work to explore the potential meanings of print advertisements. Linguistic features such as repetition, variation, contrast and phonological structure in the verbal texts of ads can give rise to shades-of-meaning or slight variations in advertising. The language of advertising is not only language in words. It is also a language in images, colors, and pictures. Pictures and words combine to form the advertisement's visual text.. While the words are very important in delivering the sales message, the visual text cannot be ignored in advertisements. Forming part of the visual text is the paralanguage of the ad. Paralanguage is the meaningful behaviour accompanying language, such as voice quality, gestures, facial expressions and touch in speech, and choice of typeface and letter sizes in writing. Foregrounding is the throwing into relief of the linguistic sign against the background of the norms of ordinary language. This paper focuses its discussion on the advertisements within the framework of the paradigmatic and the syntagmatic relationship. The sources of ads have been confined to Malboro. The ads were reselected based on purposive sampling methods.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A Convergence Study for Development of Psychological Language Analysis Program: Comparison of Existing Programs and Trend Analysis of Related Literature (심리학적 언어분석 프로그램 개발을 위한 융합연구: 기존 프로그램의 비교와 관련 문헌의 동향 분석)

  • Kim, Youngjun;Choi, Wonil;Kim, Tae Hoon
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.1-18
    • /
    • 2021
  • While content word-based frequency analysis has obvious limitations to intentional deception or irony, KLIWC has evolved into functional word analysis and KrKwic has evolved as a way to visualize co-occurrence frequencies. However, after more than 10 years of development, several issues still need improvement. Therefore, we tried to develop a new psychological language analysis program by analyzing KLIWC and KrKwic. First, the two programs were analyzed. In particular, the morpheme classification of KLIWC and the Korean morpheme analyzer was compared to enhance the functional word analysis function, and the psychological dictionary were analyzed to strengthen the psychological analysis. As a result of the analysis, the Hannanum part-of-speech analyzer was the most subdivided, but KLIWC for personal pronouns and KKMA for endings and endings were more subdivided, suggesting the integrated use of multiple part-of-speech analyzers to strengthen functional word analysis. Second, the research trends of studies that analyzed texts with these programs were analyzed. As a result of the analysis, the two programs were used in various academic fields, including the field of Interdisciplinary Studies. In particular, KrKwic was used a lot for the analysis of papers and reports, and KLIWC was used a lot for the comparative study of the writer's thoughts, emotions, and personality. Based on these results, the necessity and direction of development of a new psychological language analysis program were suggested.

A Study on a Landscape Color Analysis according to Regional Environment - Centering on Damyang County, Jeollnamdo - (지역 환경에 따른 경관 색채분석에 관한 연구 - 전라남도 담양군을 중심으로 -)

  • Choi, Seong-Kyung;Moon, Jung-Min
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.4
    • /
    • pp.146-154
    • /
    • 2012
  • As Damyang has preserved both beautiful natural environment and tradition very well, it needs colors which can coexist with Damyang while preserving it as it is rather than colorful and refined colors. However, the present Damyang deteriorates the quality of beautiful natural scenes by chaotic uses of colors. Therefore, colors which can represent symbolism based on the present colors of Damyang should be used so that everyone can be pleased with them. Finally, the basic colors decided were classified into main, supplement and highlight colors in consideration of characteristics of each scene and they were effectively arranged based on the colors decided. If such colors and color schemes are properly applied according to characteristics of scenes, ecological, historical, cultural and traditional scenes of Damyang can be preserved consistently. Academic literature uses the abstract to succinctly communicate complex research. An abstract may act as a stand-alone entity instead of a full paper. As such, an abstract is used by many organizations as the basis for selecting research that is proposed for presentation in the form of a poster, platform/oral presentation or workshop presentation at an academic conference. Most literature database search engines index only abstracts rather than providing the entire text of the paper. Full texts of scientific papers must often be purchased because of copyright and/or publisher fees and therefore the abstract is a significant selling point for the reprint or electronic version of the full-text. Abstracts are protected under copyright law just as any other form of written speech is protected. However, publishers of scientific articles invariably make abstracts publicly available, even when the article itself is protected by a toll barrier. For example, articles in the biomedical literature are available publicly from medline which is accessible through design. It is a common misconception that the abstracts in medline provide sufficient information for medical practitioners, students, scholars and patients. The abstract can convey the main results and conclusions of a scientific article but the full text article must be consulted for details of the methodology.

  • PDF

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

  • Jung, Young-Im;Cho, Sun-Ho;Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.89-105
    • /
    • 2008
  • In this paper, we suggest a rule-based system for the prediction of natural prosodic phrase breaks from Korean texts. For the implementation of the rule-based system, (1) sentence constituents are sub-categorized according to their syntactic functions, (2) syntactic phrases are recognized using the dependency relations among sub-categorized constituents, (3) rules for predicting prosodic phrase breaks are created. In addition, (4) the length of syntactic phrases and sentences, the position of syntactic phrases in a sentence, sense information of contextual words have been considered as to determine the variable prosodic phrase breaks. Based on these rules and features, we obtained the accuracy over 90% in predicting the position of major break and no break which have high correlation with the syntactic structure of the sentence. As for the overall accuracy in predicting the whole prosodic phrase breaks, the suggested system shows Break_Correct of 87.18% and Juncture Correct of 89.27% which is higher than that of other models.

  • PDF

BEHIND CHICKEN RATINGS: An Exploratory Analysis of Yogiyo Reviews Through Text Mining (치킨 리뷰의 이면: 텍스트 마이닝을 통한 리뷰의 탐색적 분석을 중심으로)

  • Kim, Jungyeom;Choi, Eunsol;Yoon, Soohyun;Lee, Youbeen;Kim, Dongwhan
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.30-40
    • /
    • 2021
  • Ratings and reviews, despite their growing influence on restaurants' sales and reputation, entail a few limitations due to the burgeoning of reviews and inaccuracies in rating systems. This study explores the texts in reviews and ratings of a delivery application and discovers ways to elevate review credibility and usefulness. Through a text mining method, we concluded that the delivery application 'Yogiyo' has (1) a five-star oriented rating dispersion, (2) a strong positive correlation between rating factors (taste, quantity, and delivery) and (3) distinct part of speech and morpheme proportions depending on review polarity. We created a chicken-specialized negative word dictionary under four main topics and 20 sub-topic classifications after extracting a total of 367 negative words. We provide insights on how the research on delivery app reviews should progress, centered on fried chicken reviews.

The Comparative Effects of Yugmijihwangtang in Donguibogam and Experiment Research Results -Focusing on the Korean Medicine and Traditional Chinese Medicine- (육미지황탕 효능의 동의보감과 실험연구결과의 비교고찰 -한의학과 중의학을 중심으로-)

  • Han, Yoochang;Kim, Myung Dong;Lee, Sundong
    • Herbal Formula Science
    • /
    • v.25 no.2
    • /
    • pp.223-251
    • /
    • 2017
  • Objectives : A lot of experiment results of Yugmijihwangtang(YM) are reported in various kinds of journals. Many of them report on the new effects that are not recorded in the traditional medical texts. So it is necessary to take it into consideration that newly reported effects could be of help to clinical practice, because this process of comparison of Donguibogam and scientific experiment results will have basis to lead into the evidence based medicine. Methods : We compared the effects of in Donguibogam and the experiment results of YM. Results : The effects of YM in Donguibogam are to replenish essence and marrow, and to treat red wen, fatigue, treat hypouresis, urinary sediment, urinary urgency, hematuria, hydrocephalus, speech and movement retardation, yin-deficiency, diabetes mellitus, nonalcoholic fatty liver, melanoma, disability to see near and far sight, tinnitus, hearing loss, alopecia, angiogenesis, cough, cough at night, trachyphonia, and, infantile convulsion. The experiment results of YM since 2000 in both Korea and China are to inhibit atopic dermatitis, renal interstitial fibrosis, anti-oxidant, emphysema, stress, glomerulosclerosis, diabetic nephropathy, chronic glomerulonephritis, hemorrhage, plantar sweating, dermal aging, kidney aging, bone loss, breast cancer, pathological myocardial cell, primary liver cancer, thrombosis, osteoporosis, intrauterine growth retardation, chronic renal failure, IgA nepropathy, slow cerebral development, and hippocampal tissue lesions on the one hand, and to help bone formation, renin-angiotensin- aldosterone system, cerebral recovery, cognitive function and expression, osteoblast proliferation and differentiation, learning and memory, cold-tolerance and oxygen deficit-tolerance and anti-fatigue, endometrial formation, humoral and cell-mediated immunity, immune regulation effect, Hypothalamus-Pituitary-Ovary Axis, and spermatogenesis, on the other hand. Conclusion : When we compared the effects of YM with the experiment results of YM, there existed a considerable gap between them. So, from now on, it is expected that a great effort and consideration are needed to solve these gaps from an academic and clinical point of view.

Structural Study on Dance Story-Telling (무용의 스토리텔링 구조연구)

  • Kim, Ki-Hwa;Baek, Hyun-Soon
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.265-274
    • /
    • 2012
  • This study aims to examine the physical language of dance art's acceptability of the discourse method of story-telling as a narrative discourse system from the view of story-telling of cultural contents. Dance, through the establishment of relationship between dancers and stage art, can form a discourse system with various literary devices including figures of speech, metaphors, and symbols. The argument over manifestation of dance's narrative components in the concept of story-telling is shown as follows; the background as an object can offer time and spatial backgrounds through stage art and the dancers' performance elements; and, for the character, the dancer himself can be the first-person-narrator and possibly makes plane personality descriptions. As for the elements of main affairs of dance, the stage art components present the background of primary motif of incident and the dancer's diverse relationships form conflicts through the correlation of solo dance, duet, and group dance. The plot as a process of developing the main affair is led by actant such as the dancer's mime actions, gestures, facial expressions, etc. The element of dance's revealing narration is the dance art itself and the developing structure of narration is the dance language's own grammar. Choreographers should compose persuasive dance texts to convey stories efficiently through character decisions, their actions, stage art's elements that display the time and spatial backgrounds, and the development of plot, as a narrative discourse of dance.