• Title/Summary/Keyword: Derived words

Search Result 365, Processing Time 0.027 seconds

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus (품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Lim, Min-Kyu;Kim, Kwang-Ho;Kim, Ji-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

Analysis of Meta Fashion Meaning Structure using Big Data: Focusing on the keywords 'Metaverse' + 'Fashion design' (빅데이터를 활용한 메타패션 의미구조 분석에 관한 연구: '메타버스' + '패션디자인' 키워드를 중심으로)

  • Ji-Yeon Kim;Shin-Young Lee
    • Fashion & Textile Research Journal
    • /
    • v.25 no.5
    • /
    • pp.549-559
    • /
    • 2023
  • Along with the transition to the fourth industrial revolution, the possibility of metaverse-based innovation in the fashion field has been confirmed, and various applications are being sought. Therefore, this study performs meaning structure analysis and discusses the prospects of meta fashion using big data. From 2020 to 2022, data including the keyword "metaverse + fashion design" were collected from portal sites (Naver, Daum, and Google), and the results of keyword frequency, N-gram, and TF-IDF analyses were derived using text mining. Furthermore, network visualization and CONCOR analysis were performed using Ucinet 6 to understand the interconnected structure between keywords and their essential meanings. The results were as follows: The main keywords appeared in the following order: fashion, metaverse, design, 3D, platform, apparel, and virtual. In the N-gram analysis, the density between fashion and metaverse words was high, and in the TF-IDF analysis results, the importance of content- and technology-related words such as 3D, apparel, platform, NFT, education, AI, avatar, MCM, and meta-fashion was confirmed. Through network visualization and CONCOR analysis using Ucinet 6, three cluster results were derived from the top emerging words: "metaverse fashion design and industry," "metaverse fashion design and education," and "metaverse fashion design platform." CONCOR analysis was also used to derive differentiated analysis results for middle and lower words. The results of this study provide useful information to strengthen competitiveness in the field of metaverse fashion design.

On the Pronunciation and the Meaningful Rendering of the Oriental Medical Chinese Terminology into Korean (한의학용어(韓醫學用語)의 발음(發音)과 독음(讀音)에 대(對)하여 -두음법칙(頭音法則)과 경음화(硬音化)를 중심으로-)

  • Park, YungHwan;Kang, YeonSeok;Maeng, WoongJae
    • The Journal of Korean Medical History
    • /
    • v.23 no.2
    • /
    • pp.23-36
    • /
    • 2010
  • In this paper, this writer looked into the initial law and fortification, which are two of the most important phonetic changes of Sino-Korean words. Pronunciation and inscription rules of Oriental Medical terminologies have also been studied. Moreover, several problems of meaningful rendering of Oriental Medical Chinese terminologies into Korean have been looked into. As a result, the following conclusions could be drawn. 1. The initial law only applies to Sino-Korean words that consist of more than one syllable. It does not apply to words borrowed from foreign languages. Especially, compound words like Jang-ssi-yu-gyeong(張氏類經) or Im-sin-yuk-hyeol(姙娠衄血) consist of already existing words such as Jang-ssi(張氏), Yu-gyeong(類經), Im-sin(姙娠), and Yuk-hyeol(衄血), and thus the initial law applies to these words. They are inscribed and pronounced 'Jang-ssi-yu-gyeong' and 'Im-sin-yuk-hyeol'. 2. Fortification of Sino-Korean words can be applied variously according to the structure and meaning of the words. Words such as '科', '格', '氣', '法', '病', '症', and '證' are often fortified and at the same time used frequently in Oriental Medicine. Also, many other words are derived from these words. However, there has not been a scholastic consent among the Oriental Medical society as to in which circumstances these words will be fortified. Therefore, a standardization process to stipulate the pronunciation of Oriental Medical terminologies is necessary. 3. Meaningful rendering of Oriental Medical Chinese terminologies into Korean also needs scholastic investigation. Especially, the word 兪 should be meaningfully rendered and pronounced 'su' just like the words 輸 and 腧, but is wrongly pronounced 'yu'. Other than this, the words 井滎兪經合, 秦艽, 膻中, 共振丹, 成無已, and 麗澤通氣湯 should respectively be pronounced 'jeong-hyeong-su-gyeong-hap', 'jin-gyo', 'dan-jung', 'gong-sin-dan', 'Seong-mu-yi', and 'Yi-taek-tong-gi-tang'. Moreover, there are four pronunciations to the word 梴 of 李梴. This should also be standardized. This writer proposes that in the future, correct meaningful rendering of Chinese terminologies into Korean and phonetic signs be inscribed in dictionaries regarding Oriental Medical terminologies.

A Study on the Perception of Metaverse Fashion Using Big Data Analysis

  • Hosun Lim
    • Fashion & Textile Research Journal
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2023
  • As changes in social and economic paradigms are accelerating, and non-contact has become the new normal due to the COVID-19 pandemic, metaverse services that build societies in online activities and virtual reality are spreading rapidly. This study analyzes the perception and trend of metaverse fashion using big data. TEXTOM was used to extract metaverse and fashion-related words from Naver and Google and analyze their frequency and importance. Additionally, structural equivalence analysis based on the derived main words was conducted to identify the perception and trend of metaverse fashion. The following results were obtained: First, term frequency(TF) analysis revealed the most frequently appearing words were "metaverse," "fashion," "virtual," "brand," "platform," "digital," "world," "Zepeto," "company," and "game." After analyzing TF-inverse document frequency(TF-IDF), "virtual" was the most important, followed by "brand," "platform," "Zepeto," "digital," "world," "industry," "game," "fashion show," and "industry." "Metaverse" and "fashion" were found to have a high TF but low TF-IDF. Further, words such as "virtual," "brand," "platform," "Zepeto," and "digital" had a higher TF-IDF ranking than TF, indicating that they had high importance in the text. Second, convergence of iterated correlations analysis using UNICET revealed four clusters, classified as "virtual world," "metaverse distribution platform," "fashion contents technology investment," and "metaverse fashion week." Fashion brands are hosting virtual fashion shows and stores on metaverse platforms where the virtual and real worlds coexist, and investment in developing metaverse-related technologies is under way.

Perception of High School Students in Chonnam Province on the Meteorology Terms in Geography Textbooks of North Korean Secondary School (북한 중등과정 지리 교과서 기상학분야 용어에 대한 전남지역 고등학생들의 이해)

  • Hong, Jeong-Min;Jeong, Young-Kun
    • Journal of the Korean earth science society
    • /
    • v.27 no.1
    • /
    • pp.15-19
    • /
    • 2006
  • In this study, the meteorology terms in the geography text books of North Korea which includes all of the meteorology educational contents in secondary school curriculum are compared with those in the earth science text books in South Korea. Forty science terms which are the same meaning but composed of different words are picked up to investigate the degree for 89 high school students in Chonnam province to perceive the meanings correctly. High school students' perceptions is on the average 30% higher in terminology of South Korea textbooks than in those of North Korea. But, students' perceptions on 9 North Korean terms is rather higher compared to South Korean terms. Twenty six (83.9%) terms which are difficult for high school students to perceive correct meanings are those lately composed of North Korean native words. Most of meteorology terms in South Korean textbooks are derived from the Chinese characters or imported from foreign language terms are easier for high school students to perceive correct meanings than those of North Korean terms derived from North Korean native language.

Analysis of drama viewership related words through unstructured data collection (비정형데이터 수집을 통한 드라마 시청률 연관어 분석)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1567-1574
    • /
    • 2017
  • In this paper, we analyzed the stereotyped and non - stereotyped data in order to analyze the drama 's ratings. The formalized data collection collected 19 items from the four areas of drama information, person information, broadcasting information, and audience rating information of each broadcasting company. Atypical data were collected from bulletin boards, pre - broadcast blogs and post - broadcast blogs operated by each broadcasting company using a crawling technique. As a result of comparing the differences according to the four areas for each broadcaster from the collected regular data, the results were similar to each other. And we derived seven related words by analyzing the correlation of occurrence frequencies from unstructured data collected from bulletin boards and blogs of each broadcasting company. The derived associations were obtained through reliability analysis.

A Trend Analysis of Radiological Research in Korea using Topic Modeling (토픽모델링을 이용한 국내 방사선 학술연구 트렌드 분석)

  • Hong, Dong-Hee
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.3
    • /
    • pp.343-349
    • /
    • 2022
  • We intend to use topic modeling to identify radiation-themed papers published from 1989 to 2022 and analyze the relevance and weight between topics. This study analyzed topics derived from national subjects for 717 papers published until recently in 2022 to contribute to the revitalization of research in the field of radiation. Through text mining, overall research trends on the subject distribution of the study were analyzed, and five topics were derived through topic modeling. First, among the papers to be analyzed, a total of 1,675 words were frequency-analyzed through the preprocessing process of key words in a total of 717 papers centered on keywords. Second, as a result of analyzing topics based on the association of constituent words for five topics, it was found that studies focused on minimizing dose in the range that does not degrade image quality in the fields of radiation, image, CT clinical. In addition, it was found that various studies were mainly conducted in the MRI, and the study of ultrasound in various areas of disease analysis was actively attempted.

Analysis of Inauguration Address of Previous Korean Presidents Based on Network (네트워크 기반 대한민국 역대 대통령 취임사 분석)

  • Kim, Hak Yong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.11-19
    • /
    • 2021
  • The presidential inaugural address is a very useful means of presenting the national vision and conveying the president's political philosophy and policy direction to the people. For this reason, analyzing the address will help to understand the president him/herself and the presidential times. The address can be analyzed in various academic fields, but in this study, it was considered as only content and analyzed based on the network. It is widely used for word cloud analysis based on the frequency of words appearing in the address. If it is analyzed based on a network, it will be a useful method because it is possible to derive the context contained in the sentence. The entire network of the addresses of past presidents of the Republic of Korea was established and structural factors were presented. The president and political direction were derived by comparatively analyzing the key words derived from the network and the word cloud. The characteristics of the address were presented by comparing and analyzing key words and closeness centrality, which is a structural factor of the network, by constructing a network of each president's inaugural address. It is expected that the network-based analysis of past presidential inaugural addresses can ultimately be used as data for understanding and evaluating presidents.

Children's Play Facilities according to the Classification of Amusement Features (놀이속성 분류에 따른 적정 어린이 놀이시설물 연구)

  • Jeong, Kil-Taek;Shin, Min-Ji;Shin, Ji-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.1
    • /
    • pp.29-37
    • /
    • 2018
  • This study intends to derive play attribute words to describe the nature of play by analyzing the correlation between play facilities and play attribute words. To investigate play attributes at playing facilities and supplement areas of weakness can provide a balanced play environment. Play attributes words were compiled via a literature review and the importance of each play attributes word was surveyed by experts. The keywords explaining play derived from news articles and references are defined as play attributes words. These words were classified into six broad categories and twenty-six sub-categories. The importance of major play attribute words show: Communication (0.268%) > Imagination (0.201%) > Amusement (0.190%) > Development (0.167%) > Learning (0.108%) > Intelligence (0.067%). Experts have recognized the most important elements are communication and imagination. Each play attribute associated with an amusement facility was separately identified in the amusement facilities installed in 114 children's parks in Seoul. Of the play attribute words, the amusement facilities at Seoul's Children's Park reflected a high frequency in 'development'. Furthermore, the importance of major playing attribute words such as 'Communication' and 'Imagination' were not fully reflected in cognitive play facilities. Therefore, it was judged that there is a need to actively introduce these attributes. This study proposed future improvements by determining weaknesses of amusement facilities in children's parks and analyzing the features and functions of play so as to suggest future improvements.

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase (지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Kim, Kwang-Ho;Lim, Min-Kyu;Kim, Ji-Hwan
    • MALSORI
    • /
    • v.68
    • /
    • pp.115-126
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.

  • PDF