• 제목/요약/키워드: Data Mining Technique

검색결과 637건 처리시간 0.022초

인터넷 설문조사의 방법론적인 문제점과 데이터마이닝 기법을 활용한 개인화된 인터넷설문조사 시스템의 구축 (Methodological Issues in Internet Survey and Development of Personalized Internet Survey System Using Data Mining Techniques)

  • 김광용;김기수
    • 품질경영학회지
    • /
    • 제32권2호
    • /
    • pp.93-108
    • /
    • 2004
  • The purpose of this research is to summarize the methodological issues in internet survey and to suggest personalized internet survey system using data mining technique for enhancing the survey quality of internet survey as well as utilizing the benefit of interactive multimedia factors of internet survey. The data mining technique used in this paper is Case Based Reasoning for adopting individual design preference affecting survey quality. For achieving the research purpose, two surveys, pre & post survey, were performed. Pre survey was done for implementing CBR database to find individual index affecting survey quality and post survey was used for measuring the peformance of personalized internet survey system. The result shows that the survey quality of personalized web survey system is better than generalized web survey system.

항만물류 서비스 품질 분석을 위한 DMQFD 모형의 개발 (Development of DMQFD Model for Analysis of Port Logistics Service Quality)

  • 송서일;이보근;정혜진
    • 산업경영시스템학회지
    • /
    • 제30권3호
    • /
    • pp.62-70
    • /
    • 2007
  • This study define the concepts of port logistics service by investigating various elements of port logistics service and grouping them in six attributes using a Data Mining technique. The QFD (Quality Function Deployment) technique is applied to measure the quality of port logistics service, and those results are analyzed. The DMQFD (Quality Function Deployment using Data Mining) model proposed in this study is a model for analyzing of port logistics service quality which is produced by combining those two stages. Using the DMQFD model, the requirements of customer could understand more correctly and systematically, and it could be an alternative tool to accomplish a customer satisfaction.

하둡과 순차패턴 마이닝 기술을 통한 교통카드 빅데이터 분석 (Analysis of Traffic Card Big Data by Hadoop and Sequential Mining Technique)

  • 김우생;김용훈;박희성;박진규
    • Journal of Information Technology Applications and Management
    • /
    • 제24권4호
    • /
    • pp.187-196
    • /
    • 2017
  • It is urgent to prepare countermeasures for traffic congestion problems of Korea's metropolitan area where central functions such as economic, social, cultural, and education are excessively concentrated. Most users of public transportation in metropolitan areas including Seoul use the traffic cards. If various information is extracted from traffic big data produced by the traffic cards, they can provide basic data for transport policies, land usages, or facility plans. Therefore, in this study, we extract valuable information such as the subway passengers' frequent travel patterns from the big traffic data provided by the Seoul Metropolitan Government Big Data Campus. For this, we use a Hadoop (High-Availability Distributed Object-Oriented Platform) to preprocess the big data and store it into a Mongo database in order to analyze it by a sequential pattern data mining technique. Since we analysis the actual big data, that is, the traffic cards' data provided by the Seoul Metropolitan Government Big Data Campus, the analyzed results can be used as an important referenced data when the Seoul government makes a plan about the metropolitan traffic policies.

시간간격을 고려한 시간관계 규칙 탐사 기법 (Discovering Temporal Relation Rules from Temporal Interval Data)

  • 이용준;서성보;류근호;김혜규
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제28권3호
    • /
    • pp.301-314
    • /
    • 2001
  • 데이터마이닝은 대용량 데이터베이스에 내재된 유용한 지식을 탐사하는 기술로 정의된다. 데이터마이닝에 대한 연구가 진행되면서 순차 패턴, 유사 시계열 탐사, 시간 연관규칙 탐사 등과 같이 시간 값을 가진 데이터로부터 지식을 탐사하고자 하는 시간 데이터마이닝에 대한 연구가 수행되었다. 그러나 기존 연구는 트랜잭션의 발생 시점만을 가진 데이터를 다루고 있으며 시간 간격을 가진 데이터는 거의 고려하고 있지 않다. 실세계에서는 환자의 병력, 상품 구매 이력, 웹 로그 등과 같은 시간간격을 가진 다양한 데이터가 존재하며 이로부터 여러 유용한 지식을 찾아낼 수 있다. Allen은 시간간격 데이터 사이에 발생할 수 있는 시간 관계와 시간 관계를 구할 수 있는 시간간격 연산자를 정의하였다. 본 논문에서는 Allen의 정의를 기반으로 시간간격 데이터로부터 시간관계 규칙을 효율적으로 탐사하기 위한 새로운 데이터마이닝 기법을 제안하였다. 이 기법은 발생 시점을 가진 시간 데이터를 시간간격 데이터로 요약하여 일반화하는 전처리 알고리즘과 시간간격 데이터로부터 시간관계 규칙을 생성하는 규clr 탐사 알고리즘으로 구성된다. 이 기법은 기존 데이터마이닝 기법에서 찾지 못하는 유용한 시간 규칙을 탐사할 수 있다.

  • PDF

캘린더 패턴 기반의 시간 연관적 분류 기법 (Temporal Associative Classification based on Calendar Patterns)

  • 이헌규;노기용;서성보;류근호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제32권6호
    • /
    • pp.567-584
    • /
    • 2005
  • 시간 데이타마이닝은 기존 데이타마이닝에 시간 개념을 추가하여 시간 속성을 가진 데이타로부터 이전에 잘 알려지지는 않았지만 묵시적이고 잠재적으로 유용한 시간 지식을 탐사하는 기술이다. 대표적 데이타마이닝 기법인 연관규칙과 분류기법은 실세계의 여러 응용분야에서 사용된다. 그러나 대부분의 데이타가 시간 속성을 포함함에도 불구하고 기존의 기법들은 시간 속성을 고려하지 않고 주로 정적인 데이타에 대한 지식 탐사만이 진행되었다. 그리고 시간 데이타에 대한 데이타마이닝 연구들은 데이타의 발생시점과 시간 제약조건을 추가한 지식 탐사에 중점을 두고 있어 데이타가 포함한 시간 의미나 시간 관계를 탐사하는데 부족하였다. 이 논문에서는 시간 클래스 연관규칙에 기반한 시간 연관적 분류기법을 제안한다. 이 기법은 분류규칙 생성을 위해서 연관적 분류에 시간 차원을 포함하여 확장한 시간 클래스 연관규칙에 의해 탐사된 규칙들을 적용하는 것이다. 그러므로 이 기법은 기존의 분류 기법들에 비해 더 유용한 지식탐사가 가능하다.

Data Mining for Uncertain Data Based on Difference Degree of Concept Lattice

  • Qian Wang;Shi Dong;Hamad Naeem
    • Journal of Information Processing Systems
    • /
    • 제20권3호
    • /
    • pp.317-327
    • /
    • 2024
  • Along with the rapid development of the database technology, as well as the widespread application of the database management systems are more and more large. Now the data mining technology has already been applied in scientific research, financial investment, market marketing, insurance and medical health and so on, and obtains widespread application. We discuss data mining technology and analyze the questions of it. Therefore, the research in a new data mining method has important significance. Some literatures did not consider the differences between attributes, leading to redundancy when constructing concept lattices. The paper proposes a new method of uncertain data mining based on the concept lattice of connotation difference degree (c_diff). The method defines the two rules. The construction of a concept lattice can be accelerated by excluding attributes with poor discriminative power from the process. There is also a new technique of calculating c_diff, which does not scan the full database on each layer, therefore reducing the number of database scans. The experimental outcomes present that the proposed method can save considerable time and improve the accuracy of the data mining compared with U-Apriori algorithm.

Designing Summary Tables for Mining Web Log Data

  • Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권1호
    • /
    • pp.157-163
    • /
    • 2005
  • In the Web, the data is generally gathered automatically by Web servers and collected in server or access logs. However, as users access larger and larger amounts of data, query response times to extract information inevitably get slower. A method to resolve this issue is the use of summary tables. In this short note, we design a prototype of summary tables that can efficiently extract information from Web log data. We also present the relative performance of the summary tables against a sampling technique and a method that uses raw data.

  • PDF

Data Mining Approach Using Practical Swarm Optimization (PSO) to Predicting Going Concern: Evidence from Iranian Companies

  • Salehi, Mahdi;Fard, Fezeh Zahedi
    • 유통과학연구
    • /
    • 제11권3호
    • /
    • pp.5-11
    • /
    • 2013
  • Purpose - Going concern is one of fundamental concepts in accounting and auditing and sometimes the assessment of a company's going concern status that is a tough process. Various going concern prediction models' based on statistical and data mining methods help auditors and stakeholders suggested in the previous literature. Research design - This paper employs a data mining approach to prediction of going concern status of Iranian firms listed in Tehran Stock Exchange using Particle Swarm Optimization. To reach this goal, at the first step, we used the stepwise discriminant analysis it is selected the final variables from among of 42 variables and in the second stage; we applied a grid-search technique using 10-fold cross-validation to find out the optimal model. Results - The empirical tests show that the particle swarm optimization (PSO) model reached 99.92% and 99.28% accuracy rates for training and holdout data. Conclusions - The authors conclude that PSO model is applicable for prediction going concern of Iranian listed companies.

  • PDF

핵심 기술 파악을 위한 특허 분석 방법: 데이터 마이닝 및 다기준 의사결정 접근법 (A patent analysis method for identifying core technologies: Data mining and multi-criteria decision making approach)

  • 김철현
    • 대한안전경영과학회지
    • /
    • 제16권1호
    • /
    • pp.213-220
    • /
    • 2014
  • This study suggests new approach to identify core technologies through patent analysis. Specially, the approach applied data mining technique and multi-criteria decision making method to the co-classification information of registered patents. First, technological interrelationship matrices of intensity, relatedness, and cross-impact perspectives are constructed with support, lift and confidence values calculated by conducting an association rule mining on the co-classification information of patent data. Second, the analytic network process is applied to the constructed technological interrelationship matrices in order to produce the importance values of technologies from each perspective. Finally, data envelopment analysis is employed to the derived importance values in order to identify priorities of technologies, putting three perspectives together. It is expected that suggested approach could help technology planners to formulate strategy and policy for technological innovation.

데이터 바이닝을 이용한 로버스트 설계 모형의 최적화 (Optimization of Robust Design Model using Data Mining)

  • 정혜진;구본철
    • 산업경영시스템학회지
    • /
    • 제30권2호
    • /
    • pp.99-105
    • /
    • 2007
  • According to the automated manufacturing processes followed by the development of computer manufacturing technologies, products or quality characteristics produced on the processes have measured and recorded automatically. Much amount of data daily produced on the processes may not be efficiently analyzed by current statistical methodologies (i.e., statistical quality control and statistical process control methodologies) because of the dimensionality associated with many input and response variables. Although a number of statistical methods to handle this situation, there is room for improvement. In order to overcome this limitation, we integrated data mining and robust design approach in this research. We find efficiently the significant input variables that connected with the interesting response variables by using the data mining technique. And we find the optimum operating condition of process by using RSM and robust design approach.