• Title/Summary/Keyword: 단어 유사도 분석

Search Result 232, Processing Time 0.033 seconds

Analysis of Sea Trial's Title for Naval Ships Based on Big Data (빅데이터 기반 함정 시운전 종목명 분석)

  • Lee, Hyeong-Sin;Seo, Hyeong-Pil;Beak, Yong-Kawn;Lee, Sang-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.420-426
    • /
    • 2020
  • The purpose and main points of the ROK-US Navy were analyzed from various angles using the big data technology Word Cloud for efficient sea trials. First, a comparison of words extracted through keyword cleansing in the ROK-US Navy sea trial showed that the ROK Navy conducted a single equipment test, and the US Navy conducted an integrated test run focusing on the system. Second, an analysis of the ROK-US Navy sea trials showed that approximately 66.6% were analyzed as similar items, of which more than two items were 112 items Approximately 44% of the 252 items of the ROK Navy sea trials overlapped, and that 89 items (35% of the total) could be reduced when integrated into the US Navy sea trials. A ship is a complex system in which multiple equipment operates simultaneously. The focus on checking the functions and performance of individual equipment, such as the ROK Navy's sea trials, will increase the sea trial period because of the excessive number of sea trial targets. In addition, the budget required will inevitably increase due to an increase in schedule and evaluation costs. In the future, further research will be needed to achieve more efficient and accurate sea trials through integrated system evaluations, such as the U.S. Navy sea trials.

Text-Mining Analysis on the Interaction between the American Consumers Aged over 60 and Companion Pets Robots: Focused on Amazon Reviews for Joy For All Companion Pets (텍스트 마이닝을 활용한 미국 노년 소비자와 애완용 로봇 간 상호작용에 대한 분석: Joy For All Companion Pets에 대한 아마존 리뷰를 중심으로)

  • Chung, Yea-Eun;Lee, Yu Lim;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.10
    • /
    • pp.469-489
    • /
    • 2021
  • This study explores consumers' responses to socially assistive robotics by using text-mining method focusing on Companion Pets from Hasbro as it gives emotional support. We conducted text frequency analysis, LDA analysis using R programming. The key findings are 1)the most frequently used words the mimicry of living pets and the appearance of companion pets, 2)the five topics were derived from the LDA analysis and classified keywords in each topic split between positive and negative, 3)user, product, environment affect the interaction between consumer and companion pets, 4)consumers who have difficulty in cognition and physical conditions use companion pets to replace living pets. This study provides an understanding of consumer responses in companion pets and gives practical implications that may improve the efficacy of usage for consumers and understand the companion robot, which provides emotional support in COVID-19.

Notes on Descriptions of the Prosodic System in French Grammars in the Age of Enlightenment & the Departure of the International Phonetic Alphabet (계몽주의 시대 프랑스 문법서에서 기술한 운율 현상과 국제음성기호의 출발에 대한 고찰)

  • Park, Moon-Kyou
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.658-667
    • /
    • 2021
  • Our study aimed to analyze and reinterpret, by an acoustic approach, the descriptions of the 18th century prosody and introduce the figurative pronunciation system, which is the International Phonetic Alphabet pioneer. Our methodology compares and analyzes grammars and documents on the transcription system and restructures the prosodic structure. It is certain that the 18th century grammarians widely accepted the prosody theories made by Arnauld & Lancelot of the seventeenth century. In particular, grammar scholars accepted the dichotomous classification of the accent structures as prosodic and oratorical accents. The prosodic accent has a relation to intonation, and the oratorical accent has as its key elements intonation and intensity. Regarding the temporal structure, the lengthening of the final syllable was observed systematically by grammarians of the 18th century. This time structure is similar to that of today. Therefore, we can conclude that the final elongation, an essential characteristic of the modern French accent, has already played an imbued role in 18th century prosody. Despite this, the 18th century grammarians did not assign it the status of accent, as it was a stereotype that matches accent with intonation.

Research Trends in Korean Healing Facilities and Healing Programs Using LDA Topic Modeling (LDA 토픽모델링을 활용한 국내 치유시설과 치유프로그램 연구 동향)

  • Lee, Ju-Hong;Lee, Kyung-Jin;Sung, Jung-Han
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.51 no.3
    • /
    • pp.95-106
    • /
    • 2023
  • Korean healing research has developed over the past 20 years along with the growing social interest in healing. The field of healing research is diverse and includes legislated natural-based healing. In this study, abstracts of 2,202 academic journals, master's, and doctoral dissertations published in KCI and RISS were collected and analyzed. As for the research method, LDA topic modeling used to classify research topics, and time-series publication trends were examined. As a result of the study, it identified that the topic of Korean healing research was connected with 5 types and 4 mediators. The five were "Healing Tourism," "Mind and Art Healing," "Forest Therapy," "Healing Space," and "Youth Restoration and Healing," and the four mediators were "Forest," "Nature," "Culture", and "Education". In addition, only legalized healing studies extracted from Korean healing research and the topics were analyzed. As a result, legalized healing research classified into four. The four types were "Healing Spatial Environment Plan", "Healing Therapy Experiment", "Agricultural Education Experiential Healing", and "Healing Tourism Factor". Forest Therapy, which has the largest amount of research in legalized healing, Agro Healing and Garden Healing which operate similar programs through plants, and Marine Healing using marine resources also analyzed. As a result, topics that show the unique characteristics of individual healing studies and topics that are considered universal in all healing studies derived. This study is significant in that it identified the overall trend of research on Korean healing facilities and programs by utilizing LDA topic modeling.

Language performance analysis based on multi-dimensional verbal short-term memories in patients with conduction aphasia (다차원 구어 단기기억에 따른 전도 실어증 환자의 언어수행력 분석)

  • Ha, Ji-Wan;Hwang, Yu Mi;Pyun, Sung-Bom
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.4
    • /
    • pp.425-455
    • /
    • 2012
  • Multi-dimensional verbal short-term memory mechanisms are largely divided into the phonological channel and the lexical-semantic channel. The former is called phonological short-term memory and the latter is called semantic short-term memory. Phonological short-term memory is further segmented into the phonological input buffer and the phonological output buffer. In this study, the language performance of each of three patients with similar levels of conduction aphasia was analyzed in terms of multi-dimensional verbal short-term memory. To this end, three patients with conduction aphasia were instructed to perform four different aspects of language tasks that are spontaneous speaking, repetition, spontaneous writing, and dictation in both word and sentence level. Moreover, the patients' phonological memories and semantic short-term memories were evaluated using digit span tests and verbal learning tests. As a result, the three subjects exhibited various types of performances and error responses in the four aspects of language tests, and the short-term memory tests also did not produce identical results. The language performance of three patients with conduction aphasia can be explained according to whether the defects occurred in the semantic short-term memory, phonological input buffer and/or phonological output buffer. In this study, the relations between language and multi-dimensional verbal short-term memory were discussed based on the results of language tests and short-term memory tests in patients with conduction aphasia.

  • PDF

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

Empirical Evaluation on the Size of E-Book Devices in User Comprehensive View (사용자의 이해력 관점에서 전자책 장치의 크기에 관한 실험적 평가)

  • Son, Yong-Bum;Kim, Young-Hak
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.167-177
    • /
    • 2012
  • Recently, with the rapid development of information technology the field of e-book market is growing rapidly. The choice of an e-book device to improve user's comprehension is one of very important elements. The effectiveness evaluation between e-books and paper books has been studied previously, but there have not been progressed actively researches on the size of e-book devices based on user's comprehension. Considering these aspects, we in this paper selected e-book devices such as currently available PDA, netbook, and notebook, and then carried out the experiment about which device has the highest user's comprehension depending on the size of e-book devices. Understanding and memory about the content on the display were set as main factors in order to evaluate user's comprehension. We prepared in advance multiple examples of e-books and English words with similar difficulty, and evaluated user's comprehension through answering questions for each example after doing the experiment. 90 undergraduate students who use most widely e-books participated in the experiment, and the result was analyzed using SPSS statistical package. The experiment result showed that user's comprehension was higher in e-book device with middle size rather than the one with big size in display size.

A Study on Project Information Integrated Management Measures Using Life Cycle Information in Road Construction Projects (도로건설사업의 생애주기별 정보를 이용한 건설사업정보 통합관리방안 연구)

  • Kim, Seong-Jin;Kim, Bum-Soo;Kim, Tae-Hak;Kim, Nam-Gon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.11
    • /
    • pp.208-216
    • /
    • 2019
  • Construction projects generate a massive amount of diverse information. It takes at least five years to more than 10 years to complete, so it is important to manage the information on a project's history, including processes and costs. Furthermore, it is necessary to determine if construction projects have been carried out according to the planned goals, and to convert a construction information management system (CALS) into a virtuous cycle. It is easy to ensure integrated information management in private construction projects because constructors can take care of the whole process (from planning to completion), whereas it is difficult for public construction projects because various agencies are involved in the projects. A CALS manages the project information of public road construction, but that information is managed according to CALS subsystems, resulting in disconnected information among the subsystems, and making it impossible to monitor integrated information. Thus, this study proposes integrated information management measures to ensure comprehensive management of the information generated during the construction life cycle. To that end, a CALS is improved by standardizing and integrating the system database, integrating the individually managed user information, and connecting the system with the Dbrain tool, which collectively builds artificial intelligence, to ensure information management based on the project budget.