• Title/Summary/Keyword: data mining processes

Search Result 141, Processing Time 0.025 seconds

Mining Trip Patterns in the Large Trip-Transaction Database and Analysis of Travel Behavior (대용량 교통카드 트랜잭션 데이터베이스에서 통행 패턴 탐사와 통행 행태의 분석)

  • Park, Jong-Soo;Lee, Keum-Sook
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.44-63
    • /
    • 2007
  • The purpose of this study is to propose mining processes in the large trip-transaction database of the Metropolitan Seoul area and to analyze the spatial characteristics of travel behavior. For the purpose. this study introduces a mining algorithm developed for exploring trip patterns from the large trip-transaction database produced every day by transit users in the Metropolitan Seoul area. The algorithm computes trip chains of transit users by using the bus routes and a graph of the subway stops in the Seoul subway network. We explore the transfer frequency of the transit users in their trip chains in a day transaction database of three different years. We find the number of transit users who transfer to other bus or subway is increasing yearly. From the trip chains of the large trip-transaction database, trip patterns are mined to analyze how transit users travel in the public transportation system. The mining algorithm is a kind of level-wise approaches to find frequent trip patterns. The resulting frequent patterns are illustrated to show top-ranked subway stations and bus stops in their supports. From the outputs, we explore the travel patterns of three different time zones in a day. We obtain sufficient differences in the spatial structures in the travel patterns of origin and destination depending on time zones. In order to examine the changes in the travel patterns along time, we apply the algorithm to one day data per year since 2004. The results are visualized by utilizing GIS, and then the spatial characteristics of travel patterns are analyzed. The spatial distribution of trip origins and destinations shows the sharp distinction among time zones.

  • PDF

A Study on Quality Control Using Data Mining in Steel Continuous Casting Process (철강 연주공정에서 데이터마이닝을 이용한 품질제어 방법에 관한 연구)

  • Kim, Jae-Kyeong;Kwon, Taeck-Sung;Choi, Il-Young;Kim, Hyea-Kyeong;Kim, Min-Yong
    • Journal of Information Technology Services
    • /
    • v.10 no.3
    • /
    • pp.113-126
    • /
    • 2011
  • The smelting and the continuous casting of steel are important processes that determine the quality of steel products. Especially most of quality defects occur during solidification of the steel continuous casting process. Although quality control techniques such as six sigma, SQC, and TQM can be applied to the continuous casting process for improving quality of steel products, these techniques don't provide real-time analysis to identify the causes of defect occurrence. To solve problems, we have developed a detection model using decision tree which identified abnormal transactions to have a coarse grain structure. And we have compared the proposed model with models using neural network and logistic regression. Experiments on steel data showed that the performance of the proposed model was higher than those of neural network model and logistic regression model. Thus, we expect that the suggested model will be helpful to control the quality of steel products in real-time in the continuous casting process.

Comparative co-expression analysis of RNA-Seq transcriptome revealing key genes, miRNA and transcription factor in distinct metabolic pathways in diabetic nerve, eye, and kidney disease

  • Asmy, Veerankutty Subaida Shafna;Natarajan, Jeyakumar
    • Genomics & Informatics
    • /
    • v.20 no.3
    • /
    • pp.26.1-26.19
    • /
    • 2022
  • Diabetes and its related complications are associated with long term damage and failure of various organ systems. The microvascular complications of diabetes considered in this study are diabetic retinopathy, diabetic neuropathy, and diabetic nephropathy. The aim is to identify the weighted co-expressed and differentially expressed genes (DEGs), major pathways, and their miRNA, transcription factors (TFs) and drugs interacting in all the three conditions. The primary goal is to identify vital DEGs in all the three conditions. The overlapped five genes (AKT1, NFKB1, MAPK3, PDPK1, and TNF) from the DEGs and the co-expressed genes were defined as key genes, which differentially expressed in all the three cases. Then the protein-protein interaction network and gene set linkage analysis (GSLA) of key genes was performed. GSLA, gene ontology, and pathway enrichment analysis of the key genes elucidates nine major pathways in diabetes. Subsequently, we constructed the miRNA-gene and transcription factor-gene regulatory network of the five gene of interest in the nine major pathways were studied. hsa-mir-34a-5p, a major miRNA that interacted with all the five genes. RELA, FOXO3, PDX1, and SREBF1 were the TFs interacting with the major five gene of interest. Finally, drug-gene interaction network elucidates five potential drugs to treat the genes of interest. This research reveals biomarker genes, miRNA, TFs, and therapeutic drugs in the key signaling pathways, which may help us, understand the processes of all three secondary microvascular problems and aid in disease detection and management.

Optimal Design of Fixture Layouts in Multi-Station Assembly Processes

  • Kim, Pan-Soo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.11a
    • /
    • pp.369-372
    • /
    • 2006
  • Optimal engineering design is challenging because nonlinear objective functions need to be evaluated in a high-dimensional space. This paper presents a data-mining aided optimal design method. The method is employed in designing an optimal multi-station fixture layout. Its benefit is demonstrated by a comparison with currently available optimization methods.

  • PDF

Application of data mining techniques for finding customer-oriented product market segments (고객지향 세분시장 획득을 위한 데이터 마이닝 기법 적용방안)

  • Kim, Jong-Ho
    • Journal of Digital Contents Society
    • /
    • v.13 no.3
    • /
    • pp.385-392
    • /
    • 2012
  • The definition of the product market in a supplier's point of view can cause various problems in the market activities of companies because specific situations are excluded and the consideration for discontinuity is lacking by identifying segmented markets with processes, raw materials, the similarity of product functions and so forth. Furthermore, as this definition is static and general, it is difficult to express and predict the dynamic market changes. Meanwhile, customer-oriented market segment can be obtained by grouping substitutable products and related customers in the situation pursuing specific benefits. This definition of the product market enables us to find threats and opportunities emerging in markets and promotes effective performance assessments and resource allocation. The purpose of this paper is suggesting a framework to select data mining techniques proper for the customer data characteristics to identify customer oriented product market.

A New Ensemble System using Dynamic Weighting Method (동적 중요도 결정 방법을 이용한 새로운 앙상블 시스템)

  • Seo, Dong-Hun;Lee, Won-Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.6
    • /
    • pp.1213-1220
    • /
    • 2011
  • In this paper, a new ensemble system using dynamic weighting method with added weight information into classifiers is proposed. The weights used in the traditional ensemble system are those after the training phase. Once extracted, the weights in the traditional ensemble system remain fixed regardless of the test data set. One way to circumvent this problem in the gating networks is to update the weights dynamically by adding processes making architectural hierarchies, but it has the drawback of added processes. A simple method to update weights dynamically, without added processes, is proposed, which can be applied to the already established ensemble system without much of the architectural modification. Experiment shows that this method performs better than AdaBoost.

Outlier Detection from LiDAR Data based on the Relative Density (상대적 밀도를 이용한 LiDAR 데이터의 Outlier 검출)

  • 문지영;이임평;김성준;김경옥
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.11a
    • /
    • pp.507-512
    • /
    • 2004
  • LiDAR data often include outliers, the points being signficantly separated from other points and so seeming not to be measured from physical surfaces. Outliers should be removed before processing further the data for applications. Many methods have been developed for other data rather than LiDAR data as a part of data mining processes but their straightforward application to LiDAR data did not provide satisfactory results. In this study, we have thus modified one of such methods by considering the properties of LiDAR data and developed a method based on the relative point density. The proposed method have been applied to simulated and real data. The results confirms its promising performance with respect to the processing time and the detection accuracy

  • PDF

The Necessity of Business Intelligence as an Indispensable Factor in the Healthcare Sector

  • KANG, Eungoo
    • The Korean Journal of Food & Health Convergence
    • /
    • v.8 no.6
    • /
    • pp.19-29
    • /
    • 2022
  • Business intelligence (BI) is a process for turning data into insights that inform an organization's strategic and tactical decisions. BI aims to give decision-makers the information they need to make better decisions Patient safety analysis, illness surveillance, and fraud identification are just a few healthcare decision-making processes that can be supported by data mining. Thus, the purpose of the current research is to outline the need if BI as an essential factor in the healthcare sector by reviewing various scholarly materials and the findings. The present author conducted one of the most famous qualitative literature approach which has been called as PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) statement. The selecting criteria for eligible prior studies were estimated by whether studies are suitable for the current research, identifying they are peer-reviewed and issued by notable publishers between 2017 and 2022. According to the result based on the PRISMA analysis, BI plays a vital role in the healthcare sector and there are four business intelligence factors (Data, Analytic, Reporting, and Visualization) that will ensure that the healthcare sector provides the right healthcare services to the customers to be addressed in this section include; data, analytics, reporting, and visualization.

A Colored Workflow Model for Business Process Analysis (비즈니스 프로세스 분석을 위한 색채형 워크플로우 모델)

  • Jeong, Woo-Jin;Kim, Kwang-Hoon
    • Journal of Internet Computing and Services
    • /
    • v.10 no.3
    • /
    • pp.113-129
    • /
    • 2009
  • Abstract Corporate activities are composed of numerous working processes and during the working flow, various business processes are being created and completed simultaneously. Enterprise Resources Planning (ERP) makes the working process simple, yet creates more complicated work structure and therefore, there is an absolute need of efficient management for business processes. The workflow literature has been looking for efficient and effective ways of rediscovering and mining workflow intelligence and knowledge from their enactment histories and event logs. As part of studies to analyze and improve the process, the concepts of 'Process Mining', 'Process re-discovery', 'BPR (Business Process Reengineering)' have appeared and the studies for practical implementation are proactively being done. However, these studies normally follow the approach throughout data warehousing for log data of process instances. It is very hard for these approaches to reflect user's intention to the rediscovering and mining activities. The process instances designed based on the consideration of analysis can make groupings effectively and when the analysis demand of user changes within the analysis domain can also reduce the cost of analysis. Therefore, the thesis proposes a special type of workflow model, which is called a colored workflow model, that is extended from the ICN (information control net) modeling methodology by reinforcing the concept of colored token. The colored tokens represent the conceptual types of constraints and criteria that can be used to classifying and grouping the workflow intelligence and knowledge extracted from the corresponding workflow models' enactment histories and event logs. Through the runtime information of process instances, it makes possible to analyze proactive and user-oriented process with the goal of deriving business knowledge from the beginning of process definition.

  • PDF

Model Development for Specific Degradation Using Data Mining and Geospatial Analysis of Erosion and Sedimentation Features

  • Kang, Woochul;Kang, Joongu;Jang, Eunkyung;Julien, Piere Y.
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.85-85
    • /
    • 2020
  • South Korea experiences few large scale erosion and sedimentation problems, however, there are numerous local sedimentation problems. A reliable and consistent approach to modelling and management for sediment processes are desirable in the country. In this study, field measurements of sediment concentration from 34 alluvial river basins in South Korea were used with the Modified Einstein Procedure (MEP) to determine the total sediment load at the sampling locations. And then the Flow Duration-Sediment Rating Curve (FD-SRC) method was used to estimate the specific degradation for all gauging stations. The specific degradation of most rivers were found to be typically 50-300 tons/㎢·yr. A model tree data mining technique was applied to develop a model for the specific degradation based on various watershed characteristics of each watershed from GIS analysis. The meaningful parameters are: 1) elevation at the middle relative area of the hypsometric curve [m], 2) percentage of wetland and water [%], 3) percentage of urbanized area [%], and 4) Main stream length [km]. The Root Mean Square Error (RMSE) of existing models is in excess of 1,250 tons/㎢·yr and the RMSE of the proposed model with 6 additional validations decreased to 65 tons/㎢·yr. Erosion loss maps from the Revised Universal Soil Loss Equation (RUSLE), satellite images, and aerial photographs were used to delineate the geospatial features affecting erosion and sedimentation. The results of the geospatial analysis clearly shows that the high risk erosion area (hill slopes and construction sites at urbanized area) and sedimentation features (wetlands and agricultural reservoirs). The result of physiographical analysis also indicates that the watershed morphometric characteristic well explain the sediment transport. Sustainable management with the data mining methodologies and geospatial analysis could be helpful to solve various erosion and sedimentation problems under different conditions.

  • PDF