• Title/Summary/Keyword: 단어 검색

Search Result 558, Processing Time 0.025 seconds

Opinion Mining of Product Reviews using Sentiment Phrase Patterns considered the Endings of Declinable Words (어미변화를 고려한 감성 구문 패턴을 이용한 상품평 의견 분류)

  • Kim, Jung-Ho;Cha, Myung-Hoon;Kim, Myung-Kyu;Chae, Soo-Hoan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.285-290
    • /
    • 2010
  • 인터넷이 대중화됨에 따라 누구나 쉽게 자신의 의견을 온라인상에 표현할 수 있게 되었다. 그 결과 생각이나 느낌을 나타내는 의견 데이터들의 양이 급속도로 방대해졌으며, 이러한 데이터들을 이용한 여러 응용 사례들의 등장으로, 효율적인 검색 및 자동 분류 기술이 요구되고 있다. 이런 기술적 흐름에 맞추어 의견 데이터 분류에 관한 여러 연구들이 이루어져 왔다. 이러한 의견 분류에 대한 연구들을 살펴보면, 분류를 위해 자질(Feature)로서 사용한 단일어(Single word)가 아닌 2개 이상의 N-gram 단어, 어휘 구문 패턴 및 통사 구문 패턴 등을 사용한다. 특히, 패턴은 단일어나 N-gram 단어에 비해 유연하고, 언어학적으로 풍부한 정보를 표현할 수 있기 때문에 이를 주요 연구 주제로 사용되었다. 그럼에도 불구하고, 이러한 연구들은 주로 영어에 대한 연구들이었으며, 한국어에 패턴을 적용하여 주관성을 갖는 문장을 분류하거나, 극성을 분류하는 연구들은 아직 미비하다. 한국어의 특색으로 한국어는 용언의 활용이 발달되어 있어, 어미의 변화가 다양하며, 그 변화에 따라 의미가 미묘하게 변화한다. 그러나 기존 한국어에 대한 의견 분류 연구들은 단어의 핵심 의미만을 파악하기 위해 어미 부분을 제거하고 어간만을 취해서 처리하여 어미에 대한 의미변화를 고려하지 못하므로 분류 정확도가 영어권에 연구 결과에 비해 떨어진다. 그래서 본 연구는 영어에 적용된 패턴을 이용한 기존 방법들을 정리하고, 그 방법들 중에서 극성을 지닌 문장성분 패턴을 한국어에 적용하였다. 그리고 어미의 변화에 대한 패턴을 추출하여 이 변화가 의견 분류의 성능에 미치는 영향을 분석하였다.

  • PDF

A Text Network Analysis of North Korean Library Journal, 『Reference Materials for Librarian』 (북한 도서관잡지 『도서관일군 참고자료』의 텍스트 네트워크 분석)

  • Lee, Seongsin;Kim, Hyunsook;Baek, Sumin;Yoon, Subin;Choi, Jae-Hwang
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.3
    • /
    • pp.169-191
    • /
    • 2022
  • The purpose of this study is to attempt a text network analysis for two years of 『Reference Materials for Librarian』 (2016-2017) published by the Library Operation Methodology Research Institute in North Korea. A text network analysis can measure how important a particular word by grasping the connectivity and relationship between words beyond a simple word frequency analysis, and it is also possible to interpret specific social phenomena and derive implications. Frequency, degree centrality, the betweenness centrality, community analysis of the collected words were calculated using NetMiner. As a result, the terms 'users', 'information services', 'information needs', 'information technology', 'social learning', 'computers', 'databases', 'information acquisition', 'information retrieval' and 'librarian' were appeared as important ones in understanding North Korean libraries.

Analysis of the Severity of Self-Esteem Reduction Using Text Mining (텍스트 마이닝을 이용한 자존감 저하의 심각성 분석)

  • Kim, Beom-su;Hwang, Yeong-bin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.47-51
    • /
    • 2021
  • In this study, we try to find out and analyze the results of reduced self-esteem and loss using text mining. Physical health is important, of course, but these days, mental health is considered more important. In order for the mind to be healthy, it is important to have self-esteem and self-confidence first. Self-esteem decreases, and if lost, it directly leads to depression. If depression is severe, the worst will lead to self-harm and suicide. However, more and more people are committing suicide these days because both ordinary people and entertainers cannot overcome depression. For this reason, the seriousness of depression and loss of self-esteem are also considered important and become an issue. Therefore, we want to collect data for a certain period of time through Naver, Instagram, and Twitter searches and extract the words of the data to anticipate and analyze the cause of loss of self-esteem, how serious the recent depression is, and what the consequences of loss of self-esteem are.

  • PDF

A Study on the Content Utilization of KISTI Science and Technology Information Service (KISTI 과학기술정보서비스의 콘텐츠 활용 분석)

  • Kang, Nam-Gyu;Hwang, Mi-Nyeong
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.87-95
    • /
    • 2020
  • The Science and Technology Information Service provided by the Korea Institute of Science and Technology Information (KISTI) is a service designed to allow users to easily and conveniently search and view content that is built similar to the general information service. NDSL is KISTI's core science, technology and information service, providing about 138 million content and having about 93 million page views in a year of 2019. In this paper, various insights were derived through the analysis of how science and technology information such as academic papers, reports and patents provided by NDSL is searched and utilized through web services (https://www.ndsl.kr) and search query words. In addition to general statistics such as the status of content construction, utilization status and utilization methods by type of content, monthly/weekly/time-of-day content usage, content view rate per one-time search by content type, the comparison of the use status of academic papers by year, the relationship between the utilization of domestic academic papers and the KCI index we analyzed the usability of each content type, such as academic papers and patents. We analyzed query words such as the language form of query words, the number of words of query words, and the relationship between query words and timeliness by content type. Based on the results of these analyses, we would like to propose ways to improve the service. We suggest that NDSL improvements include ways to dynamically reflect the results of content utilization behavior in the search results rankings, to extend query and to establish profile information through non-login user identification for targeted services.

SWOSpark : Spatial Web Object Retrieval System based on Distributed Processing (SWOSpark : 분산 처리 기반 공간 웹 객체 검색 시스템)

  • Yang, Pyoung Woo;Nam, Kwang Woo
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.53-60
    • /
    • 2018
  • This study describes a spatial web object retrieval system using Spark, an in - memory based distributed processing system. Development of social networks has created massive amounts of spatial web objects, and retrieval and analysis of data is difficult by using exist spatial web object retrieval systems. Recently, development of distributed processing systems supports the ability to analyze and retrieve large amounts of data quickly. Therefore, a method is promoted to search a large-capacity spatial web object by using the distributed processing system. Data is processed in block units, and one of these blocks is converted to RDD and processed in Spark. Regarding the discussed method, we propose a system in which each RDD consists of spatial web object index for the included data, dividing the entire spatial region into non-overlapping spatial regions, and allocating one divided region to one RDD. We propose a system that can efficiently use the distributed processing system by dividing space and increasing efficiency of searching the divided space. Additionally by comparing QP-tree with R-tree, we confirm that the proposed system is better for searching the spatial web objects; QP-tree builds index with both spatial and words information while R-tree build index only with spatial information.

Improving Bidirectional LSTM-CRF model Of Sequence Tagging by using Ontology knowledge based feature (온톨로지 지식 기반 특성치를 활용한 Bidirectional LSTM-CRF 모델의 시퀀스 태깅 성능 향상에 관한 연구)

  • Jin, Seunghee;Jang, Heewon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.253-266
    • /
    • 2018
  • This paper proposes a methodology applying sequence tagging methodology to improve the performance of NER(Named Entity Recognition) used in QA system. In order to retrieve the correct answers stored in the database, it is necessary to switch the user's query into a language of the database such as SQL(Structured Query Language). Then, the computer can recognize the language of the user. This is the process of identifying the class or data name contained in the database. The method of retrieving the words contained in the query in the existing database and recognizing the object does not identify the homophone and the word phrases because it does not consider the context of the user's query. If there are multiple search results, all of them are returned as a result, so there can be many interpretations on the query and the time complexity for the calculation becomes large. To overcome these, this study aims to solve this problem by reflecting the contextual meaning of the query using Bidirectional LSTM-CRF. Also we tried to solve the disadvantages of the neural network model which can't identify the untrained words by using ontology knowledge based feature. Experiments were conducted on the ontology knowledge base of music domain and the performance was evaluated. In order to accurately evaluate the performance of the L-Bidirectional LSTM-CRF proposed in this study, we experimented with converting the words included in the learned query into untrained words in order to test whether the words were included in the database but correctly identified the untrained words. As a result, it was possible to recognize objects considering the context and can recognize the untrained words without re-training the L-Bidirectional LSTM-CRF mode, and it is confirmed that the performance of the object recognition as a whole is improved.

A Study on the Activation Plan of in Internet (동굴의 인터넷 홍보 활성화 방안)

  • You, Young-Jun
    • Journal of the Speleological Society of Korea
    • /
    • no.90
    • /
    • pp.17-26
    • /
    • 2009
  • Many researcher contended that the tourism types of modern society has changed to 'experience tourism' from 'seeing tourism' recently. Therefore attempts to add edutainment factors in the original tour program has been introduced in cases of tourism cave. As we can see the Hwaam-cave in Jungsun-gun, by preparing with various pavilions which are well-matched with tourism cave. So, the purpose of this study is to the presentations of the activation programs that having on characteristics of edutainment tourism resources for tourism cave. Thus, first of all for establishing of this study, we examined the characteristics and values as tourism resources of cave and then observed the roles and examples of edutainment tourism cave, finally suggested.

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Multiple Cause Model-based Topic Extraction and Semantic Kernel Construction from Text Documents (다중요인모델에 기반한 텍스트 문서에서의 토픽 추출 및 의미 커널 구축)

  • 장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.595-604
    • /
    • 2004
  • Automatic analysis of concepts or semantic relations from text documents enables not only an efficient acquisition of relevant information, but also a comparison of documents in the concept level. We present a multiple cause model-based approach to text analysis, where latent topics are automatically extracted from document sets and similarity between documents is measured by semantic kernels constructed from the extracted topics. In our approach, a document is assumed to be generated by various combinations of underlying topics. A topic is defined by a set of words that are related to the same topic or cooccur frequently within a document. In a network representing a multiple-cause model, each topic is identified by a group of words having high connection weights from a latent node. In order to facilitate teaming and inferences in multiple-cause models, some approximation methods are required and we utilize an approximation by Helmholtz machines. In an experiment on TDT-2 data set, we extract sets of meaningful words where each set contains some theme-specific terms. Using semantic kernels constructed from latent topics extracted by multiple cause models, we also achieve significant improvements over the basic vector space model in terms of retrieval effectiveness.

A Study on the Changes in Perspectives on Unwed Mothers in S.Korea and the Direction of Government Polices: 1995~2020 Social Media Big Data Analysis (한국미혼모에 대한 관점 변화와 정부정책의 방향: 1995년~2020년 소셜미디어 빅데이터 분석)

  • Seo, Donghee;Jun, Boksun
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.305-313
    • /
    • 2021
  • This study collected and analyzed big data from 1995 to 2020, focusing on the keywords "unwed mother", "single mother," and "single mom" to present appropriate government support policy directions according to changes in perspectives on unwed mothers. Big data collection platform Textom was used to collect data from portal search sites Naver and Daum and refine data. The final refined data were word frequency analysis, TF-IDF analysis, an N-gram analysis provided by Textom. In addition, Network analysis and CONCOR analysis were conducted through the UCINET6 program. As a result of the study, similar words appeared in word frequency analysis and TF-IDF analysis, but they differed by year. In the N-gram analysis, there were similarities in word appearance, but there were many differences in frequency and form of words appearing in series. As a result of CONCOR analysis, it was found that different clusters were formed by year. This study confirms the change in the perspective of unwed mothers through big data analysis, suggests the need for unwed mothers policies for various options for independent women, and policies that embrace pregnancy, childbirth, and parenting without discrimination within the new family form.