• Title/Summary/Keyword: 지식기반 데이터 마이닝

Search Result 128, Processing Time 0.028 seconds

Building Concept Networks using a Wikipedia-based 3-dimensional Text Representation Model (위키피디아 기반의 3차원 텍스트 표현모델을 이용한 개념망 구축 기법)

  • Hong, Ki-Joo;Kim, Han-Joon;Lee, Seung-Yeon
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.9
    • /
    • pp.596-603
    • /
    • 2015
  • A concept network is an essential knowledge base for semantic search engines, personalized search systems, recommendation systems, and text mining. Recently, studies of extending concept representation using external ontology have been frequently conducted. We thus propose a new way of building 3-dimensional text model-based concept networks using the world knowledge-level Wikipedia ontology. In fact, it is desirable that 'concepts' derived from text documents are defined according to the theoretical framework of formal concept analysis, since relationships among concepts generally change over time. In this paper, concept networks hidden in a given document collection are extracted more reasonably by representing a concept as a term-by-document matrix.

CALS환경에서 기업간 정보공유의 범위에 관한 연구

  • 고일상
    • Proceedings of the CALSEC Conference
    • /
    • 1999.07a
    • /
    • pp.41-50
    • /
    • 1999
  • 최근의 CALS 시스템 구축 가속화 및 전자상거래 활성화 분위기는 정보기술의 전략적 활용의 초점을 개별기업 위주에서 기업간 협력체제 구축과 이를 바탕으로 하는 기업간 전자적 결합(Electronic Integration)으로 옮기고 있다. 우리는 정보기술의 공유를 기반으로 새로운 기업간의 관계를 구축할 수 있으며, 이러한 관계를 전략적으로 활용함으로써 기존 사업에서의 경쟁우위 뿐만 아니라, 새로운 사업기회를 창출할 수 있다. 관련기업간의 데이터 표준화를 통한 정보의 공유는 CALS 시스템 구축의 바탕이 된다고 할 수 있는데, 정보공유의 범위는 산업별 특성과 제품별 특성에 따라 달라질 수밖에 없으며, 관련기업간의 교섭력(Bargaining Power)에 의해서도 영향을 받게된다. CALS 환경에서는 개별기업의 정보전략에 따라 수직적 협력관계에 있는 기업들간의 정보공유 정도와 수평적 협력관계에 있는 기업들간의 정보공유 정도가 결정된다고 할 수 있다. 정보공유의 범위를 설정하는 데에는 기업간에 이루어지는 업무의 성격, 공급-제조-유통으로 연결되는 관련기업간의 구조 등도 크게 영향을 미치는 것으로 파악되고 있다. 이 연구에서는 CALS 시스템 구축과정에서 우리가 고려할 수 있는 정보공유의 범위를 관련업무공유, 시스템공유, 부품 설비공유, 시설공유 등의 관점에서 분석하여 봄으로써, 각 산업별로 진행되고 있는 시범사업들의 시스템 영역의 범위를 재조명해보고자 한다. 이 연구에서 집중적으로 다루게 될 정보공유의 범위에 대한 내용은 CALS 시스템을 개발하려는 기업들의 응용서비스 개발 및 정보전략 구축에 중요한 공헌을 할 수 있을 것으로 기대된다.진" 사업에 대한 표준 설정을 위하여 노스캐롤라이나주 지방보건소의 "보건교육/건강증진" 표준체제를 예로 들었다. 다음으로, 경제적인 효율면에서 볼 때 "보건교육/건강증진" 사업에는 단기 혹은 장기 투자가치가 있는가 하는 것이다. 새로 태어나는 미국 지방 보건소의 "보건교육/건강증진" 활동은 지역 시민 및 그 단체가 광범위하고도 자연 다발적으로 참여할 때만 성공할 수 있다고 결론 지울 수 있다.한 분야별 전문가시스템 개발을 지양하고 MCRDR이론을 기반으로 한 범용성 있는 전문가시스템 개발 툴의 개발에 관한 연구를 소개한다. 후 새로운 지식을 얻는 반복적인 Explanation-based Data Mining Architecture를 제시하였다. 본 연구의 의의로는 데이타 마이닝을 통한 귀납적 지식생성에 있어 귀납적 오류의 발생을 고메인 지식을 통해 설명가능 함을 보임으로 검증하고 아울러 이러한 설명을 통해 연역적으로 새로운 가설지식을 생성시켜 이를 가설검증방식으로 검증함으로써 귀납적 접근과 연역적 접근의 통합 데이타 마이닝 접근을 제시하였다는데 있다.gical learning to give information necessary to improve the making. program and policy decision making, The objectives of the study are to develop the methodology of modeling the socioeconomic evaluation, and build up the practical socioeconomic evaluation model of the HAN projec

  • PDF

A Study on Social Contents-Recommendation method using Data Mining and Collective Intelligence (데이터 마이닝과 집단 지성 기법을 활용한 소셜 콘텐츠 추천 방법에 대한 연구)

  • Kang, Daehyun;Park, Hansaem;Lee, Jeungmin;Kwon, Kyunglag;Chung, In-Jeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.1050-1053
    • /
    • 2014
  • 웹 기반 서비스의 발전과 스마트 기기의 보급으로 사용자들은 다양한 웹 서비스들을 이용할 수 있게 되었고, 소셜 웹과 같은 사람들 간의 관계를 형성함으로써 정보를 주고받는 서비스에 접근하여 자신만의 콘텐츠를 생성, 공유하기가 용이해졌다. 그러나 소셜 웹 사용자들이 증가하고 지식의 양이 늘어남에 따라, 방대한 양의 지식들 중 필요한 정보만을 효율적으로 창출해내고자 하는 연구 또한 시도되어 왔다. 그러나, 기존의 방법은 다수의 서비스 사용자들의 공통적인 관심사가 반영된 결과를 도출해내기에는 부족하다는 단점이 있었다. 그리하여, 본 논문에서는 집단 지성 알고리즘과 의사 결정 나무를 활용하여 소셜 웹을 이용하는 사용자들의 태그와 URL 정보를 토대로 트렌드를 분석, 콘텐츠를 추천하는 방법을 제안하고, 이를 통하여 다수 사용자들의 기호가 반영된 다양한 정보들을 소셜 웹 사용자들에게 제공해줄 수 있음을 보인다.

Text Mining and Social Network Analysis-based Patent Analysis Method for Improving Collaboration and Technology Transfer between University and Industry (산학협력 및 기술이전 촉진을 위한 텍스트마이닝과 사회 네트워크 분석 기반의 특허 분석 방법)

  • Lee, Ji Hyoung;Kim, Jong Woo
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.3
    • /
    • pp.1-28
    • /
    • 2017
  • Today, according to the increased importance of industry-university cooperation in the knowledge-based economy, support and the number of researches involved in industry-university cooperation has also steadily increased. But it is true that profits from the outcome of patents resulting from such cooperation, such as technology transfer and royalty fees, are lower than they are supposed to be, because of excessive patents applications, although some of them have little commercial potential. Therefore, this research aims to suggest a way to analyze and recognize patents, which enable efficient industry-university cooperation and technology transfer. For the analysis, data on 1,061 patents was collected from 4 different universities. With the data, a quality-strategy matrix was arranged targeting the industry-university cooperation foundations', US patents owned by universities, text mining, and social network analysis were carried out, particularly focusing on the patents in the advanced quality technology section of the matrix. Then core key words and IPC codes were obtained and key patents were analyzed by universities. As a result of the analysis, it was found that 4 key patents, 2 key IPC codes were drawn for University H, 4 key patents, 2 key IPC codes for University K, 6 key patents, 1 key IPC code for University Y, 14 key patents, and 2 key IPC codes for University S. This research is expected to have a great significance in contributing to the invigoration of industry-university cooperation based on the analysis result on patents and IPC codes, which enable efficient industry-university cooperation and technology transfer.

Analysis of Major COVID-19 Issues Using Unstructured Big Data (비정형 빅데이터를 이용한 COVID-19 주요 이슈 분석)

  • Kim, Jinsol;Shin, Donghoon;Kim, Heewoong
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.145-165
    • /
    • 2021
  • As of late December 2019, the spread of COVID-19 pandemic began which put the entire world in panic. In order to overcome the crisis and minimize any subsequent damage, the government as well as its affiliated institutions must maximize effects of pre-existing policy support and introduce a holistic response plan that can reflect this changing situation- which is why it is crucial to analyze social topics and people's interests. This study investigates people's major thoughts, attitudes and topics surrounding COVID-19 pandemic through the use of social media and big data. In order to collect public opinion, this study segmented time period according to government countermeasures. All data were collected through NAVER blog from 31 December 2019 to 12 December 2020. This research applied TF-IDF keyword extraction and LDA topic modeling as text-mining techniques. As a result, eight major issues related to COVID-19 have been derived, and based on these keywords, this research presented policy strategies. The significance of this study is that it provides a baseline data for Korean government authorities in providing appropriate countermeasures that can satisfy needs of people in the midst of COVID-19 pandemic.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

Frequency and Social Network Analysis of the Bible Data using Big Data Analytics Tools R (빅데이터 분석도구 R을 이용한 성경 데이터의 빈도와 소셜 네트워크 분석)

  • Ban, ChaeHoon;Ha, JongSoo;Kim, Dong Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.166-171
    • /
    • 2020
  • Big data processing technology that can store and analyze data and obtain new knowledge has been adjusted for importance in many fields of the society. Big data is emerging as an important problem in the field of information and communication technology, but the mind of continuous technology is rising. the R, a tool that can analyze big data, is a language and environment that enables information analysis of statistical bases. In this paper, we use this to analyze the Bible data. We analyze the four Gospels of the New Testament in the Bible. We collect the Bible data and perform filtering for analysis. The R is used to investigate the frequency of what text is distributed and analyze the Bible through social network analysis, in which words from a sentence are paired and analyzed between words for accurate data analysis.

Personalized Recommendation based on Item Dependency Map (Item Dependency Map을 기반으로 한 개인화된 추천기법)

  • Youm, Sun-Hee;Cho, Dong-Sub
    • Proceedings of the KIEE Conference
    • /
    • 2001.07d
    • /
    • pp.2789-2791
    • /
    • 2001
  • 데이터 마이닝을 통해 우리는 숨겨진 지식, 예상되지 않았던 경향 그리고 새로운 법칙들을 방대한 데이터에서 이끌어내고자 한다. 본 논문에서 우리는 사용자의 구매 패턴을 발견하여 사용자가 원하는 상품을 미리 예측하여 추천하는 알고리즘을 소개하고자 한다. 제안하고 있는 item dependency map은 구매된 상품간의 관계를 수식화 하여 행렬의 형태로 표현한 것이다. Item dependency map의 값은 사용자가 A라는 상품을 구매한 후 B상품을 살 확률이다. 이런 정보를 가지고 있는 item dependency map은 홉필드 네트윅(Hopfield network)에서 연상을 위한 패턴 값으로 적용된다. 홉필드 네트웍은 각 노드사이의 연결가중치에 기억하고자 하는 것들을 연상시킨 뒤 어떤 입력을 통해서 전체 네트워크가 어떤 평형상태에 도달하는 방식으로 작동되는 신경망 중의 하나이다. 홉필드 네트웍의 특징 중의 하나는 부분 정보로부터 전체 정보를 추출할 수 있는 것이다. 이러한 특징을 가지고 사용자들의 일반적인 구매패턴을 일부 정보만 가지고 예측할 수 있다. Item dependency map은 홉필드 네트윅에서 사용자들의 그룹별 패턴을 학습하는데 사용된다. 따라서 item dependency map이 얼마나 사용자 구매패턴에 대한 정보를 가지고 있는지에 따라 그 결과가 결정되는 것이다. 본 논문은 정확한 item dependency map을 계산해 내는 알고리즘을 주로 논의하겠다.

  • PDF

Improving Embedding Model for Triple Knowledge Graph Using Neighborliness Vector (인접성 벡터를 이용한 트리플 지식 그래프의 임베딩 모델 개선)

  • Cho, Sae-rom;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.67-80
    • /
    • 2021
  • The node embedding technique for learning graph representation plays an important role in obtaining good quality results in graph mining. Until now, representative node embedding techniques have been studied for homogeneous graphs, and thus it is difficult to learn knowledge graphs with unique meanings for each edge. To resolve this problem, the conventional Triple2Vec technique builds an embedding model by learning a triple graph having a node pair and an edge of the knowledge graph as one node. However, the Triple2 Vec embedding model has limitations in improving performance because it calculates the relationship between triple nodes as a simple measure. Therefore, this paper proposes a feature extraction technique based on a graph convolutional neural network to improve the Triple2Vec embedding model. The proposed method extracts the neighborliness vector of the triple graph and learns the relationship between neighboring nodes for each node in the triple graph. We proves that the embedding model applying the proposed method is superior to the existing Triple2Vec model through category classification experiments using DBLP, DBpedia, and IMDB datasets.

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.3
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF