• Title/Summary/Keyword: 단어 유사도 분석

Search Result 231, Processing Time 0.029 seconds

Case Study on Public Document Classification System That Utilizes Text-Mining Technique in BigData Environment (빅데이터 환경에서 텍스트마이닝 기법을 활용한 공공문서 분류체계의 적용사례 연구)

  • Shim, Jang-sup;Lee, Kang-wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.1085-1089
    • /
    • 2015
  • Text-mining technique in the past had difficulty in realizing the analysis algorithm due to text complexity and degree of freedom that variables in the text have. Although the algorithm demanded lots of effort to get meaningful result, mechanical text analysis took more time than human text analysis. However, along with the development of hardware and analysis algorithm, big data technology has appeared. Thanks to big data technology, all the previously mentioned problems have been solved while analysis through text-mining is recognized to be valuable as well. However, applying text-mining to Korean text is still at the initial stage due to the linguistic domain characteristics that the Korean language has. If not only the data searching but also the analysis through text-mining is possible, saving the cost of human and material resources required for text analysis will lead efficient resource utilization in numerous public work fields. Thus, in this paper, we compare and evaluate the public document classification by handwork to public document classification where word frequency(TF-IDF) in a text-mining-based text and Cosine similarity between each document have been utilized in big data environment.

  • PDF

Content Analysis on the News Report Cases of Vibrio (내용분석을 통한 언론의 비브리오 보도사례 분석)

  • Woo, Ha-Joong;Kim, Young-Kyu
    • Journal of the Korean Society of Food Culture
    • /
    • v.22 no.4
    • /
    • pp.492-497
    • /
    • 2007
  • The objectives of this study are to determine the full extent of the negative media reports and to broaden public awareness through content analysis. Samples of this study are news reports on vibrio on three major broadcasting companies such as MBC, KBS and SBS and three major national newspapers such as Chosun daily, Joongang daily and Donga daily in Korea for 5 years from January 1st in 2000 to December 31st in 2004. Total 628 cases were searched through from the web sites of fore mentioned TV and newspaper companies. It is highly advised to adhere to the proven fact as much as possible and full and thorough research on the outcome should be sought by media before they reach to the public.

Implementation of User Recommendation System based on Video Contents Story Analysis and Viewing Pattern Analysis (영상 스토리 분석과 시청 패턴 분석 기반의 추천 시스템 구현)

  • Lee, Hyoun-Sup;Kim, Minyoung;Lee, Ji-Hoon;Kim, Jin-Deog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.12
    • /
    • pp.1567-1573
    • /
    • 2020
  • The development of Internet technology has brought the era of one-man media. An individual produces content on user own and uploads it to related online services, and many users watch the content of online services using devices that allow them to use the Internet. Currently, most users find and watch content they want through search functions provided by existing online services. These features are provided based on information entered by the user who uploaded the content. In an environment where content needs to be retrieved based on these limited word data, user unwanted information is presented to users in the search results. To solve this problem, in this paper, the system actively analyzes the video in the online service, and presents a way to extract and reflect the characteristics held by the video. The research was conducted to extract morphemes based on the story content based on the voice data of a video and analyze them with big data technology.

A Study of Psychometric Function Curve for Korean Standard Monosyllabic Word Lists for Preschoolers (KS-MWL-P) (한국표준 학령전기용 단음절어표 (Korean Standard Monosyllabic Word Lists for Preschoolers, KS-MWL-P)의 심리음향기능곡선 연구)

  • Shin, Hyun-Wook;Kim, Jin-Sook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.534-541
    • /
    • 2009
  • Word recognition test (WRT) for the children can be useful for diagnosing the degree of communication disability, prescribing hearing instruments, planning aural rehabilitation and speech therapy, and determination of site of lesions. The Korean standard monosyllabic word lists for preschoolers (KS-MWL-P) were developed considering the criteria given by the literatures. However, the authors of KS-MWL-P suggested more children should be included to verify homogeneity of the lists using psychometric function curve since only 8 children participated in the developing process. The purpose of this study was to explore the homogeneity of KS-MWL-P for supplementing the limitations of the lists employing psychometric analysis. To 23 preschoolers who have normal-hearing, 100 monosyllabic KS-MWL-P words were examined with the pictures. Psychometric function curve with linear slopes of 20% and 80%'s correct rates through accounting recognition scores of each monosyllabic word at variable intensities from -10 to 40 dBHL was obtained and analyzed. As a result, s-shaped psychometric function curve was presented with increasing correct rate depending on intensity and showed no statistical significant differences among each word and list. The congruous graph shapes among lists also indicated good homogeneity and the list 1,2,3,4's average slopes were 4.48, 3.86, 4.65, 4.50. It was verified that the homogeneity was suitable because the analysis of variance showed no statistical significance among lists (p>0.05). However, KS-MWL-P's order of slope according to the order of the number of items, $1{\sim}10$, $1{\sim}20$, $1{\sim}25$ showed no difference with the p-value of 0.93, 0.59, 0.91, 0.70 for the lists 1,2,3, and 4, respectively. Although KS-MWL-P was assumed that the lower-numbered items were easy for testing younger ages, this study's results could not agree with the author's conclusion. Considering this matter, rearranging of the number of items should be performed according to the analysis of slope suggested by this study for testing younger children with easier items. Other than this, in conclusion, KS-MWL-P was proved to be useful for clinical and rehabilitative evaluating and training tools for preschoolers.

The effect of semantic categorization of episodic memory on encoding of subordinate details: An fMRI study (일화 기억의 의미적 범주화가 세부 기억의 부호화에 미치는 영향에 대한 자기공명영상 분석 연구)

  • Yi, Darren Sehjung;Han, Sanghoon
    • Korean Journal of Cognitive Science
    • /
    • v.28 no.4
    • /
    • pp.193-221
    • /
    • 2017
  • Grouping episodes into semantically related categories is necessary for better mnemonic structure. However, the effect of grouping on memory of subordinate details was not clearly understood. In an fMRI study, we tested whether attending superordinate during semantic association disrupts or enhances subordinate episodic details. In each cycle of the experiment, five cue words were presented sequentially with two related detail words placed underneath for each cue. Participants were asked whether they could imagine a category that includes the previously shown cue words in each cycle, and their confidence on retrieval was rated. Participants were asked to perform cued recall tests on presented detail words after the session. Behavioral data showed that reaction times for categorization tasks decreased and confidence levels increased in the third trial of each cycle, thus this trial was considered to be an important insight where a semantic category was believed to be successfully established. Critically, the accuracy of recalling detail words presented immediately prior to third trials was lower than those of followed trials, indicating that subordinate details were disrupted during categorization. General linear model analysis of the trial immediately prior to the completion of categorization, specifically the second trial, revealed significant activation in the temporal gyrus and inferior frontal gyrus, areas of semantic memory networks. Representative Similarity Analysis revealed that the activation patterns of the third trials were more consistent than those of the second trials in the temporal gyrus, inferior frontal gyrus, and hippocampus. Our research demonstrates that semantic grouping can cause memories of subordinate details to fade, suggesting that semantic retrieval during categorization affects the quality of related episodic memory.

Topic based Web Document Clustering using Named Entities (개체명을 이용한 주제기반 웹 문서 클러스터링)

  • Sung, Ki-Youn;Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.29-36
    • /
    • 2010
  • Past clustering researches are focused on extraction of keyword for word similarity grouping. However, too many candidates to compare and compute bring high complexity, low speed and low accuracy. To overcome these weaknesses, this paper proposed a topical web document clustering model using not only keyword but also named entities such as person name, organization, location, and so on. By several experiments, we prove effects of our model compared with traditional model based on only keyword and analyze how different effects show according to characteristics of document collection.

Learning User Interest using Hierarchical Concept indexing based on Ontology (온톨로지 기반의 계층적 개념 인덱싱을 이용한 사용자 관심사 학습)

  • Park Ji-Hyun;Kim Heung-Nam;Jo Geun-Sik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.646-648
    • /
    • 2005
  • 인터넷의 급속한 성장과 더불어 사용자들은 인터넷을 통해 많은 정보를 얻을 수 있게 되었으며 최신 뉴스를 실시간으로 접근할 수 있게 되었다. 이에 따라 방대한 정보 속에 사용자 관심사에 맞는 정보를 효과적으로 검색하기 위한 여러 방법들이 연구되어 왔다. 하지만 기존의 많은 선행 연구들은 단어 빈도 기반의 키워드 벡터 모델을 이용하여 사용자의 관심사를 학습하고 있다. 이러한 키워드 벡터 모델은 사용자의 선호도를 명확하게 기술하지 못하고 키워드를 이용한 특징 벡터 (feature-vector)는 개념들 사이의 관계를 찾기 어려운 한계를 가지고 있다. 이를 개선하기 위해 본 논문에선 계층적 개념 인덱싱(Hierarchical Concept Indexing)을 이용한 온톨로지 형태의 개인화된 사용자 프로파일을 만드는 방법을 제안한다. 생성된 사용자 프로파일에 개념 간의 유사도와 개념에 대한 사용자의 관심도를 고려하여 보다 개인의 선호도에 맞는 기사를 제공한다. 실험에서는 제안된 방법의 성능 평가를 위해서 기존의 키워드 벡터 모델의 학습 방법인 WebMate 시스템과 비교 분석하였다. 그 결과 제안하는 방법이 키워드 벡터를 이용한 학습 방법보다 향상된 성능을 보였다.

  • PDF

Hybrid Food Recommendation System Using Auto-generated User Profiles (자동 생성된 사용자 프로파일을 이용한 하이브리드 음식 추천 시스템)

  • Jeong, Ju-Seok;Kang, Sin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.609-617
    • /
    • 2011
  • This paper proposes a personalized food recommendation system using user profiles auto-generated from Twitter. The user profiles are generated by extracting nouns from Twitter, and calculating emotional scores according to whether each noun is collocated with emotion words. Representative noun information for each food is constructed by analyzing web pages relevant to foods. Appropriate foods for users can be recommended by calculating similarities among the extracted resources. The proposed system has an advantage in that it can always recommend foods even if a user is a newcomer.

Analysis of Music Mood Class using Folksonomy Tags (폭소노미 분위기 태그를 이용한 음악의 분위기 유형 분석)

  • Moon, Chang Bae;Kim, HyunSoo;Kim, Byeong Man
    • Science of Emotion and Sensibility
    • /
    • v.16 no.3
    • /
    • pp.363-372
    • /
    • 2013
  • When retrieving music with folksonomy tags, internal use of numeric tags (AV tags: tags consisting of Arousal and Valence values ) instead of word tags can partially solve the problem posed by synonyms. However, the two predecessor tasks should be done correctly; the first task is to map word tags to their numeric tags; the second is to get numeric tags of the music pieces to be retrieved. The first task is verified through our prior study and thus, in this paper, its significance is seen for the second task. To this end, we propose the music mapping table defining the relation between AV values and music and ANOVA tests are performed for analysis. The result shows that the arousal values and valence values of music have different distributions for 12 mood tags with or without synonymy and that their type I error values are P<0.001. Consequently, it is checked that the distribution of AV values is different according to music mood.

  • PDF

SOM(State of Mind) Model for Emotional Judgement Asymmetry between Positive and Negative Information (감성판단에 대한 SOM(States of Mind) 모형 : 긍정적/부정적 정보의 비대칭성)

  • 임승락;김지혜
    • Science of Emotion and Sensibility
    • /
    • v.4 no.2
    • /
    • pp.69-77
    • /
    • 2001
  • Positively biased asymmetry between positive and negative cognition is the basic assumption of heuristic human functioning. This article describes the SOM(states of mind) model for emotional judgement, a psycho-mathematical model built on affective-cognitive assessment research on the balance of positive and negative thoughts and feelings. The SOM model suggests that subjects on the average choose a positive over a negative pole with the probability 0.62 and the precise value of this constant coincides with algebraic “golden section”  .618:.382. Statistical analyses of 32 normal subjects shows that the mean of SOM ratios of self-referent judgement and incidental recall task for positive/negative emotional words are .62(SD=.08) and .58(SD=.4). Also, the SOM ratios are significantly correlated with self-referent judgement for positive/negative emotional words. Implications of cognitive balance and future research directions for emotional science are discussed.

  • PDF