Search | Korea Science

Efficient Mining of Frequent Itemsets in a Sparse Data Set (희소 데이터 집합에서 효율적인 빈발 항목집합 탐사 기법)

Park In-Chang;Chang Joong-Hyuk;Lee Won-Suk
- The KIPS Transactions:PartD
- /
- v.12D no.6 s.102
- /
- pp.817-828
- /
- 2005
The main research problems in a mining frequent itemsets are reducing memory usage and processing time of the mining process, and most of the previous algorithms for finding frequent itemsets are based on an Apriori-property, and they are multi-scan algorithms. Moreover, their processing time are greatly increased as the length of a maximal frequent itemset. To overcome this drawback, another approaches had been actively proposed in previous researches to reduce the processing time. However, they are not efficient on a sparse .data set This paper proposed an efficient mining algorithm for finding frequent itemsets. A novel tree structure, called an $L_2$-tree, was proposed int, and an efficient mining algorithm of frequent itemsets using $L_2$-tree, called an $L_2$-traverse algorithm was also proposed. An $L_2$-tree is constructed from $L_2$, i.e., a set of frequent itemsets of size 2, and an $L_2$-traverse algorithm can find its mining result in a short time by traversing the $L_2$-tree once. To reduce the processing more, this paper also proposed an optimized algorithm $C_3$-traverse, which removes previously an itemset in $L_2$ not to be a frequent itemsets of size 3. Through various experiments, it was verified that the proposed algorithms were efficient in a sparse data set.
https://doi.org/10.3745/KIPSTD.2005.12D.6.817 인용 PDF KSCI

Location Generalization Method for Pattern Mining of Moving Object (이동 객체의 패턴 마이닝을 위한 위치 일반화 방법)

Ko, Hyun;Kim, Kwang-Jong;Lee, Yon-Sik
- Annual Conference of KIPS
- /
- 2006.11a
- /
- pp.405-408
- /
- 2006
사용자들의 특성에 맞게 개인화되고 세분화된 위치 기반 서비스를 제공하기 위해서는 방대한 이동객체의 위치 이력 데이터로부터 유용한 패턴을 추출하기 위한 시간 패턴 탐사가 필요하다. 기존의 시간 패턴 탐사 기법들은 이동 객체의 시간에 따른 공간 속성들의 변화를 충분히 고려하지 못하거나, 시공간 속성을 동시에 고려한 패턴 탐사는 가능하나 제약을 가진 공간 정보를 포함하는 패턴 탐사 문제에는 적용하기 어렵다. 따라서 이동 객체의 위치 이력 데이터들에 대한 시공간적 속성들을 동시에 고려하여 다양한 이동 패턴들 중 공간 제약을 만족하는 패턴들을 추출하기 위한 새로운 이동 패턴 탐사 기법이 요구된다. 이러한 패턴 탐사 기법의 개발을 위해서는 상세 수준의 위치 이력 데이터들을 공간 영역 정보 형태로 변환하는 위치 일반화 접근법이 필요하다. 이에 본 논문에서는 객체의 위치값과 공간 영역간의 위상 관계를 고려하여 이동 객체의 위치 속성에 대한 공간영역으로의 일반화 방법을 제안한다. 이동 객체의 상세 수준의 위치 정보에서는 의미있는 패턴을 찾기가 어렵기 때문에 데이터 전처리 과정을 통해 일반화된 데이터 집합을 형성함으로써 효율적인 이동 객체의 시간 패턴 마이닝을 유도할 수 있다.
PDF

A Code Recommendation Method Using RNN Based on Interaction History (RNN을 이용한 동작기록 마이닝 기반의 추천 방법)

Cho, Heetae;Lee, Seonah;Kang, Sungwon
- KIPS Transactions on Software and Data Engineering
- /
- v.7 no.12
- /
- pp.461-468
- /
- 2018
Developers spend a significant amount of time exploring and trying to understand source code to find a source location to modify. To reduce such time, existing studies have recommended the source location using statistical language model techniques. However, in these techniques, the recommendation does not occur if input data does not exactly match with learned data. In this paper, we propose a code location recommendation method using Recurrent Neural Networks and interaction histories, which does not have the above problem of the existing techniques. Our method achieved an average precision of 91% and an average recall of 71%, thereby reducing time for searching and exploring code more than the existing recommendation techniques.
https://doi.org/10.3745/KTSDE.2018.7.12.461 인용 PDF KSCI HTML

Process analysis in Supply Chain Management with Process Mining: A Case Study (프로세스 마이닝 기법을 활용한 공급망 분석: 사례 연구)

Lee, Yonghyeok;Yi, Hojeong;Song, Minseok;Lee, Sang-Jin;Park, Sera
- The Journal of Bigdata
- /
- v.1 no.2
- /
- pp.65-78
- /
- 2016
In the rapid change of business environment, it is crucial that several companies with core competence cooperate together in order to deliver competitive products to the market faster. Thus a lot of companies are participating in supply chains and SCM (Supply Chain Management) become more important. To efficiently manage supply chains, the analysis of data from SCM systems is required. In this paper, we explain how to analyze SCM related data with process mining techniques. After discussing the data requirement for process mining, several process mining techniques for the data analysis are explained. To show the applicability of the techniques, we have performed a case study with a company in South Korea. The case study shows that process mining is useful tool to analyze SCM data. On specifically, an overall process, several performance measures, and social networks can be easily discovered and analyzed with the techniques.
PDF

Context Ontology and Trigger Rule Design for Service Pattern Mining (서비스 패턴 마이닝을 위한 컨텍스트 온톨로지 및 트리거 규칙 설계)

Hwang, Jeong-Hee
- Journal of Digital Contents Society
- /
- v.13 no.3
- /
- pp.291-299
- /
- 2012
Ubiquitous computing is a technique to provide users with appropriate services, collecting the context information in somewhere by attached sensor. An intelligent system needs to automatically update services according to the user's various circumstances. To do this, in this paper, we propose a design of context ontology, trigger rule for mining service pattern related to users activity and an active mining architecture integrating trigger system. The proposed system is a framework for active mining user activity and service pattern by considering the relation between user context and object based on trigger system.
https://doi.org/10.9728/dcs.2012.13.3.291 인용 PDF KSCI

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열 분석을 통한 부상 기술 예측)

Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.9
- /
- pp.355-360
- /
- 2014
Forecasting of emerging technology plays important roles in business strategy and R&D investment. There are various ways for technology forecasting including patent analysis. Qualitative analysis methods through experts' evaluations and opinions have been mainly used for technology forecasting using patents. However qualitative methods do not assure objectivity of analysis results and requires high cost and long time. To make up for the weaknesses, we are able to analyze patent data quantitatively and statistically by using text mining technique. In this paper, we suggest a new method of technology forecasting using text mining and ARIMA analysis.
https://doi.org/10.3745/KTSDE.2014.3.9.355 인용 PDF KSCI

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

Kim, JungJin;Jeong, Hanseok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2021.06a
- /
- pp.236-236
- /
- 2021
최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.
PDF

Evaluation of Web Pages using User's Activities in a Page and Page Visiting Duration Time (사용자 활동과 폐이지 이용 시간을 이용한 웹 페이지 평가 기법)

Lee, Dong-Hun;Yun, Tae-Bok;Kim, Geon-Su;Lee, Ji-Hyeong
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2007.04a
- /
- pp.99-102
- /
- 2007
웹 사용 마이닝은 사용자의 웹 이용 패턴에 대해 분석하여 정보를 찾아내는 분야이다. 사용자에 대한 분석은 웹을 통한 비즈니스의 근간이 되고 있다. 때문에 웹 마이닝 분야에서 주목받고 중요시 되는 기술이 되었다. 그러나 최근에는 공개된 기술의 취약점을 이용해 악의적으로 정보를 교란하는 일이 발생되고 있어 사회적으로 이슈가 되고 있다. 이러한 문제는 특히 단순한 페이지 뷰 횟수에 기반을 둔 정보 추출 방식에 주로 발생하고 있다. 따라서 본 논문에서는 이러한 추출 방식의 단순함을 줄이고 사용자의 정보를 더 반영하기 위하여 페이지 이용 시간과 페이지 내의 행동을 분석하여 콘텐츠의 질을 평가하는 방안을 제시한다. 구현 부분에는 사용자의 개인정보 침해 없이 사용자의 행동을 수집하기 위하여 최근 인기를 얻고 있는 Ajax 기술을 사용하였다. 그리고 실시간으로 웹 페이지에 대한 평가를 수행하기 위해 서버에 로그 필터 모듈을 추가하는 수집 기법을 제안하였다.
PDF

Game Player Model Analysis with Time-series Data Mining (시간열 데이터 마이닝을 이용한 게임 사용자 모델 분석)

Kim, Jong-In;Wong, Chee-Onn;Jung, Kee-Chul
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10c
- /
- pp.293-296
- /
- 2007
컴퓨터 게임 산업이 발전함에 따라 사용자의 흥미를 측정하거나 불법 소프트웨어 구별을 위해 사용자 분석에 관한 연구가 진행 중이다. 그 예로 최근에 사용자 분석을 통하여 게임 레벨 디자인을 하거나 이를 게임 균형에 이용하는 연구들이 있다. 본 논문은 개인적인 게임 환경에서 사용자의 적절한 게임 경험을 위해 시간열 데이터 마이닝 개념을 이용하여 게임 사용자 모델 분석을 제안한다. 본 논문은 사용자가 게임을 하고 있는 동안 의미 있는 사용자 행동을 저장하고 차원감소와 SOM을 이용하여 4가지 행동 유형으로 클러스터링하고 행동 유형에 따른 사용자를 분석한다.
PDF

A Clustering Method using GHSOM for Processing Large Data (GHSOM을 이용한 대용량 데이터 처리의 군집화 방법)

Kim, Man-Sun;Lee, Sang-Yong
- Annual Conference of KIPS
- /
- 2002.11a
- /
- pp.393-396
- /
- 2002
최근 대용량의 데이터베이스로부터 유용한 정보를 발견하고 데이터간에 존재하는 연관성을 탐색하고 분석하는 데이터 마이닝에 관한 많은 연구들이 진행되고 있다. 실제 응용분야에선 수집된 데이터는 시간이 지날수록 데이터의 양이 늘어나게 되고, 중복되는 속성과 잡음을 갖게 되어 마이닝 기법을 이용하는데 많은 시간과 비용이 소요된다. 또한 어느 속성이 중요한지 알 수 없어 중요한 속성이 중요하지 않은 속성에 의해 왜곡되거나 제대로 분석되지 않을 수 있다. 본 연구는 이러한 문제점들을 해결하기 위해 GHSOM을 이용한 계층적 신경망 군집화 방법을 제안한다. 제안하는 방법은 비리 군집의 개수를 정해줄 필요가 없고, 다양한 레벨의 군집들을 얻을 수 있는 계층적 군집화를 이루어낸다는 장점을 갖는다. 본 논문에서는 신경망 GHSOM의 구조와 특성에 대해 간략히 살펴보고 시스템 처리과정에 대해 설명한다.
PDF

Search Result 402, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)