• Title/Summary/Keyword: Knowledge-Based Data Mining

검색결과 262건 처리시간 0.033초

다각도 정보융합 방법을 이용한 지능형 에이전트 시스템 (An Intelligent Agent System using Multi-View Information Fusion)

  • 이현숙
    • 한국컴퓨터정보학회논문지
    • /
    • 제19권12호
    • /
    • pp.11-19
    • /
    • 2014
  • 본 논문에서는 데이터마이닝모듈과 정보융합모듈을 핵심구성요소로 가지는 지능형에이전트 시스템을 설계하고 다각도 정보를 융합하여 진단전문가시스템으로 활용할 수 있는 가능성을 제시한다. 데이터마이닝모듈에서는 퍼지신경망 OFUN-NET에 의하여 다각도의 데이터를 분석하고 퍼지 클러스터 정보를 지식베이스로 구축한다. 정보융합모듈과 응용모듈에서는 가능성정도로 제공되는 진단결과와 불확실 결정상태나 비대칭의 발견과 같은 전문가의 진단에 유용한 정보를 제공해 주고 있다. 또한 DDSM 벤치마크 데이터베이스로부터 획득한 디지털 유방 x선 영상의 BI-RADS 기반 특징데이터를 가지고 실험한 결과는 기존의 방법보다 높은 분류 정확도를 보여주면서 컴퓨터보조진단시스템으로서의 가능성을 보여주고 있다.

A Function-Based Knowledge Base for Technology Intelligence

  • Yoon, Janghyeok;Ko, Namuk;Kim, Jonghwa;Lee, Jae-Min;Coh, Byoung-Youl;Song, Inseok
    • Industrial Engineering and Management Systems
    • /
    • 제14권1호
    • /
    • pp.73-87
    • /
    • 2015
  • The development of a practical technology intelligence system requires a knowledge base that structures the core information and its relationship distilled from large volumes of technical data. Previous studies have mainly focused on the methodological approaches for technology opportunities, while little attention has been paid to constructing a practical knowledge base. Therefore, this study proposes a procedure to construct a function-based knowledge base for technology intelligence. We define the product-function-technology relationship and subsequently present the detailed steps for the knowledge base construction. The knowledge base, which is constructed analyzing 1110582 patents between 2009 and 2013 from the United States Patent and Trademark Office database, contains the functional knowledge of products and technologies and the relationship between products and technologies. This study is the first attempt to develop a large-scale knowledge base using the concept of function and has the ability to serve as a basis not only for furthering technology opportunity analysis methods but also for developing practical technology intelligence systems.

시간 속성을 갖는 이벤트의 의미있는 희소 관계에 기반한 연관 규칙 탐사 (Finding Association Rules based on the Significant Rare Relation of Events with Time Attribute)

  • 한대영;김대인;김재인;송명진;황부현
    • 정보처리학회논문지D
    • /
    • 제16D권5호
    • /
    • pp.691-700
    • /
    • 2009
  • 이벤트는 환자의 증상과 같이 시간 속성을 갖는 하나의 흐름을 의미하며 인터벌 이벤트는 시작과 종료 시점에 대한 시간 간격을 갖는다. 그리고 시간 데이터마이닝에 대한 많은 연구가 있었지만 환자 이력, 구매자 이력, 로그 이력과 같은 인터벌 이벤트에 대한 지식 탐사 방법에 대한 연구는 미흡하다. 이 논문에서는 이벤트들의 인과 관계에 대한 연관 규칙을 탐사하고 이 규칙에 기반하여 결과 이벤트 발생을 예측하는 시간 데이터마이닝 방법을 제안한다. 제안 방법은 이벤트 시간 속성을 사용하여 인터벌 이벤트로 요약하고 이벤트들의 인과 관계를 탐사하여 이벤트 발생을 예측한다. 성능평가를 통하여 제안 방법은 다양한 지지도를 적용하여 발생 빈도에 상관없이 이벤트 발생에 높은 영향을 주는 의미있는 희소 관계를 발견함으로써 기존의 데이터마이닝 기법에 비하여 보다 우수한 정보를 탐사할 수 있다.

데이터마이닝 기법을 적용한 취수원 수질예측모형 평가 (Evaluation of Water Quality Prediction Models at Intake Station by Data Mining Techniques)

  • 김주환;채수권;김병식
    • 환경영향평가
    • /
    • 제20권5호
    • /
    • pp.705-716
    • /
    • 2011
  • For the efficient discovery of knowledge and information from the observed systems, data mining techniques can be an useful tool for the prediction of water quality at intake station in rivers. Deterioration of water quality can be caused at intake station in dry season due to insufficient flow. This demands additional outflow from dam since some extent of deterioration can be attenuated by dam reservoir operation to control outflow considering predicted water quality. A seasonal occurrence of high ammonia nitrogen ($NH_3$-N) concentrations has hampered chemical treatment processes of a water plant in Geum river. Monthly flow allocation from upstream dam is important for downstream $NH_3$-N control. In this study, prediction models of water quality based on multiple regression (MR), artificial neural network and data mining methods were developed to understand water quality variation and to support dam operations through providing predicted $NH_3$-N concentrations at intake station. The models were calibrated with eight years of monthly data and verified with another two years of independent data. In those models, the $NH_3$-N concentration for next time step is dependent on dam outflow, river water quality such as alkalinity, temperature, and $NH_3$-N of previous time step. The model performances are compared and evaluated by error analysis and statistical characteristics like correlation and determination coefficients between the observed and the predicted water quality. It is expected that these data mining techniques can present more efficient data-driven tools in modelling stage and it is found that those models can be applied well to predict water quality in stream river systems.

스트리밍 빅데이터의 프라이버시 보호 동반 실용적 분석을 통한 지식 활용과 재사용 연구 (Research of Knowledge Management and Reusability in Streaming Big Data with Privacy Policy through Actionable Analytics)

  • 백주련;이영숙
    • 디지털산업정보학회논문지
    • /
    • 제12권3호
    • /
    • pp.1-9
    • /
    • 2016
  • The current meaning of "Big Data" refers to all the techniques for value eduction and actionable analytics as well management tools. Particularly, with the advances of wireless sensor networks, they yield diverse patterns of digital records. The records are mostly semi-structured and unstructured data which are usually beyond of capabilities of the management tools. Such data are rapidly growing due to their complex data structures. The complex type effectively supports data exchangeability and heterogeneity and that is the main reason their volumes are getting bigger in the sensor networks. However, there are many errors and problems in applications because the managing solutions for the complex data model are rarely presented in current big data environments. To solve such problems and show our differentiation, we aim to provide the solution of actionable analytics and semantic reusability in the sensor web based streaming big data with new data structure, and to empower the competitiveness.

웹 도큐먼트 기반 연관 지식 추출 기법 : 생명정보분야에의 적용 (Web Document-based Associate Knowledge Extraction Method : Applying to Bioinformatics)

  • 문현정;김교정
    • 인터넷정보학회논문지
    • /
    • 제2권5호
    • /
    • pp.9-19
    • /
    • 2001
  • 본 논문에서는 웹 도큐먼트로부터 사용자의 관심과 선호도를 반영하는 지식을 자동으로 확장 탐색하고 추출하기 위한 연관지식 추출 기법을 제시한다. 사용자의 학습의도를 내포한 중심어와 연관된 정보를 예제 도큐먼트로부터 탐색 추출하기 위하여 연관 규칙 탐색 데이터 마이닝 기법을 웹 도큐먼트상의 연관 객체 추출에 적용한다. 또한 추출된 연관 정보들의 가중치 부여를 위하여 연관 태그 블록 기반 가중치 기법을 제시한다. 본 논문에서 제시된 연관 지식 추출 기법을 생명정보학 분야에 적용하여 의미적으로 연관성 있는 지식 추출 실험을 수행한 결과 매우 높은 정확성을 보이는 것으로 나타났다.

  • PDF

온라인 구매 행태를 고려한 토픽 모델링 기반 도서 추천 (Topic Modeling-based Book Recommendations Considering Online Purchase Behavior)

  • 정영진;조윤호
    • 지식경영연구
    • /
    • 제18권4호
    • /
    • pp.97-118
    • /
    • 2017
  • Thanks to the development of social media, general users become information and knowledge providers. But customers also feel difficulty to decide their purchases due to numerous information. Although recommender systems are trying to solve these information/knowledge overload problem, it may be asked whether they can honestly reflect customers' preferences. Especially, customers in book market consider contents of a book, recency, and price when they make a purchase. Therefore, in this study, we propose a methodology which can reflect these characteristics based on topic modeling and provide proper recommendations to customers in book market. Through experiments, our methodology shows higher performance than traditional collaborative filtering systems. Therefore, we expect that our book recommender system contributes the development of recommender systems studies and positively affect the customer satisfaction and management.

데이터마이닝 기법을 이용한 당뇨 발생 예측모형 개발 (A Development of a Predictive Model Using the Data Mining Technique on Diabetes Mellitus)

  • 이애경;박일수;강성홍;강현철
    • 보건행정학회지
    • /
    • 제16권2호
    • /
    • pp.21-48
    • /
    • 2006
  • As prior studies indicate that chronic diseases are mainly attributed to health behavior, preventive health care rather than treatment for illness needs to improve health status. Since chronic conditions require long-term therapy, health care expenditures to treat chronic diseases have been substantial burden at national level. In this point of view, this study suggests that the health promotion program should be based on Knowledge Based System Using Data Mining Technique, we developed a predictive model for preventive healthcare management on diabetes mellitus. Generally, in the outbreak of diabetes mellitus there is a difference in lifestyle and the risk factors according to gender. So we developed a predictive model in accordance with gender difference and applied the Logistic Regression Model based on Data Mining process. The result of the study were as follow. The lift of the last predictive model was an average 2.23 times(male model : 2.13, female model 2.33) more improved than in the random model in upper 10% group. The health risk factors of diabetes mellitus are gender, age, a place of residence, blood pressure, glucose, smoking, drinking, exercise rate. On the basis of these factors, we suggest the program of the health promotion.

Big Data Analysis on the Perception of Home Training According to the Implementation of COVID-19 Social Distancing

  • Hyun-Chang Keum;Kyung-Won Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권3호
    • /
    • pp.211-218
    • /
    • 2023
  • Due to the implementation of COVID-19 distancing, interest and users in 'home training' are rapidly increasing. Therefore, the purpose of this study is to identify the perception of 'home training' through big data analysis on social media channels and provide basic data to related business sector. Social media channels collected big data from various news and social content provided on Naver and Google sites. Data for three years from March 22, 2020 were collected based on the time when COVID-19 distancing was implemented in Korea. The collected data included 4,000 Naver blogs, 2,673 news, 4,000 cafes, 3,989 knowledge IN, and 953 Google channel news. These data analyzed TF and TF-IDF through text mining, and through this, semantic network analysis was conducted on 70 keywords, big data analysis programs such as Textom and Ucinet were used for social big data analysis, and NetDraw was used for visualization. As a result of text mining analysis, 'home training' was found the most frequently in relation to TF with 4,045 times. The next order is 'exercise', 'Homt', 'house', 'apparatus', 'recommendation', and 'diet'. Regarding TF-IDF, the main keywords are 'exercise', 'apparatus', 'home', 'house', 'diet', 'recommendation', and 'mat'. Based on these results, 70 keywords with high frequency were extracted, and then semantic indicators and centrality analysis were conducted. Finally, through CONCOR analysis, it was clustered into 'purchase cluster', 'equipment cluster', 'diet cluster', and 'execute method cluster'. For the results of these four clusters, basic data on the 'home training' business sector were presented based on consumers' main perception of 'home training' and analysis of the meaning network.

Text-Mining of Online Discourse to Characterize the Nature of Pain in Low Back Pain

  • Ryu, Young Uk
    • 대한물리의학회지
    • /
    • 제14권3호
    • /
    • pp.55-62
    • /
    • 2019
  • PURPOSE: Text-mining has been shown to be useful for understanding the clinical characteristics and patients' concerns regarding a specific disease. Low back pain (LBP) is the most common disease in modern society and has a wide variety of causes and symptoms. On the other hand, it is difficult to understand the clinical characteristics and the needs as well as demands of patients with LBP because of the various clinical characteristics. This study examined online texts on LBP to determine of text-mining can help better understand general characteristics of LBP and its specific elements. METHODS: Online data from www.spine-health.com were used for text-mining. Keyword frequency analysis was performed first on the complete text of postings (full-text analysis). Only the sentences containing the highest frequency word, pain, were selected. Next, texts including the sentences were used to re-analyze the keyword frequency (pain-text analysis). RESULTS: Keyword frequency analysis showed that pain is of utmost concern. Full-text analysis was dominated by structural, pathological, and therapeutic words, whereas pain-text analysis was related mainly to the location and quality of the pain. CONCLUSION: The present study indicated that text-mining for a specific element (keyword) of a particular disease could enhance the understanding of the specific aspect of the disease. This suggests that a consideration of the text source is required when interpreting the results. Clinically, the present results suggest that clinicians pay more attention to the pain a patient is experiencing, and provide information based on medical knowledge.