• Title/Summary/Keyword: Multivariate Dataset

Search Result 66, Processing Time 0.029 seconds

The Use of Local Outlier Factor(LOF) for Improving Performance of Independent Component Analysis(ICA) based Statistical Process Control(SPC) (LOF를 이용한 ICA 기반 통계적 공정관리의 성능 개선 방법론)

  • Lee, Jae-Shin;Kang, Bok-Young;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.36 no.1
    • /
    • pp.39-55
    • /
    • 2011
  • Process monitoring has been emphasized for the monitoring of complex system such as chemical processing industries to achieve the efficiency enhancement, quality management, safety improvement. Recently, ICA (Independent Component Analysis) based MSPC (Multivariate Statistical Process Control) was widely used in process monitoring approaches. Moreover, DICA (Dynamic ICA) has been introduced to consider the system dynamics. However, the existing approaches show the limitation that their performances are strongly dependent on the statistical distributions of control variables. To improve the limitation, we propose a novel approach for process monitoring by integrating DICA and LOF (Local Outlier Factor). In this paper, we aim to improve the fault detection rate with the proposed method. LOF detects local outliers by using density of surrounding space so that its performance is regardless of data distribution. Therefore, the proposed method not only can consider the system dynamics but can also assure robust performance regardless of the statistical distributions of control variables. Comparison experiments were conducted on the widely used benchmark dataset, Tennessee Eastman process (TE process), and showed the improved performance than existing approaches.

An Integrated Approach Using Change-Point Detection and Artificial neural Networks for Interest Rates Forecasting

  • Oh, Kyong-Joo;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.235-241
    • /
    • 2000
  • This article suggests integrated neural network models for the interest rate forecasting using change point detection. The basic concept of proposed model is to obtain intervals divided by change point, to identify them as change-point groups, and to involve them in interest rate forecasting. the proposed models consist of three stages. The first stage is to detect successive change points in interest rate dataset. The second stage is to forecast change-point group with data mining classifiers. The final stage is to forecast the desired output with BPN. Based on this structure, we propose three integrated neural network models in terms of data mining classifier: (1) multivariate discriminant analysis (MDA)-supported neural network model, (2) case based reasoning (CBR)-supported neural network model and (3) backpropagation neural networks (BPN)-supported neural network model. Subsequently, we compare these models with a neural networks (BPN)-supported neural network model. Subsequently, we compare these models with a neural network model alone and, in addition, determine which of three classifiers (MDA, CBR and BPN) can perform better. This article is then to examine the predictability of integrated neural network models for interest rate forecasting using change-point detection.

  • PDF

AGE ESTIMATION TECHNIQUE OF INDUSTRIALIZED TIMBER PLANTATION USING VARIOUS REMOTE SENSING DATA

  • Kim, Jong-Hong;Heo, Joon;Park, Ji-Sang
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.94-97
    • /
    • 2006
  • Timber stand age information of timber in industrialized plantation forest is generally collected by field surveying which is labor-intensive, time-consuming, and very costly. It is also inconsistent in analyses perspective. As an alternative, The objective of this research is to present a practical solution for estimating timber age of loblolly pine plantation using Landsat thematic mapper (TM) images, shuttle radar topography mission (SRTM), and national elevation dataset (NED). A multivariate regression model was developed based upon satellite image-based information (i.e.normalized difference vegetation index (NDVI), tasseled cap (TC) transformation, and derived tree heights). A residual studentized technique was applied to remove potential outliers. After that, a refined age estimation model with a correlation coefficient R-square of 84.6% was obtained. Finally, the feasibility test of estimated model was performed by comparing estimated and measured stand ages of timber plantations using test datasets of plantation stands (2,032 stands). The result shows that the proposed method of this study can estimate loblolly pine stand age within an error of $2{\sim}3$ years in an effective and consistent way in terms of time and cost.

  • PDF

Factors Associated with the Use of Gastric Cancer Screening Services in Korea: The Fourth Korea National Health and Nutrition Examination Survey 2008 (KNHANES IV)

  • Shin, Ji-Yeon;Lee, Duk-Hee
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.8
    • /
    • pp.3773-3779
    • /
    • 2012
  • Objective: Despite government efforts to increase participation in gastric cancer screening, the rate is still suboptimal in Korea. Therefore, we explored barriers to and predictors of gastric cancer screening participation among a nationally representative sample. Methods: We used the Health Interview Survey sub-dataset derived from the Fourth Korean National Health and Nutrition Examination Survey 2008 (KNHANES IV) to evaluate participation in gastric cancer screening and factors associated with attendance in individuals age ${\geq}40$ years. We enrolled 4,464 subjects who completed the questionnaire and were not previously diagnosed with gastric cancer. Four groups of factors were considered potential predictors of gastric cancer screening in a multivariate analysis: sociodemographic, health behavior, psychological and cognitive, and dietary factors. Results: Overall, 41.3% complied with the gastric cancer screening recommendations. Younger age, lower education level, living without a spouse, frequent binge drinker, and current smoker were significantly associated with less participation in gastric cancer screening. Conclusions: To improve participation in gastric cancer screening, more focused interventions should be directed to vulnerable populations, such as groups with low socioeconomic status or unhealthy behavior. In addition, there should be new promotional campaigns and health education to provide information targeting these vulnerable populations.

Assessment of Energy Organizations' External Conditions in the Russian Federation: A Sector Analysis

  • Vyborova, E.N.;Salyakhova, E.A.
    • Asian Journal of Business Environment
    • /
    • v.4 no.2
    • /
    • pp.17-21
    • /
    • 2014
  • Purpose - The paper analyzes basic indicators characterizing the volume of energy sector activity in the Russian Federation, Privolzhsky Federal district, Republic of Tatarstan. Research design, data, and methodology - The study analyzed data from the Privolzhsky Federal district, specifically, industrial production volume, electricity production, energy consumption, energy-balance data, capital investments, and capital investment structure. An array of data has been investigated in recent years. The dataset's dynamics were analyzed in 1998. Fixed capital investment dynamics were studied in 1946 the figures were converted to a comparable form using the index method. Trends were analyzed using multivariate statistics methods and the Statgraphics software package. Results - Hypothesis 1. There are sectoral disproportions in energy flows,taking into account the volume of electricity production and consumption. Trends in electricity production in general coincide with industrial production volume trends. Energy flows have disparities in individual territorial units, and in general. Hypothesis 2. The degree of sectoral economic stability decreases with insufficient levels of investment in fixed capital energy organizations. Conclusions - Because totalelectricity production is largely determined by fixed capital investments, the study of their trends and patterns will coordinate efforts on investment operations in this area.

Linear profile monitoring with random covariate (설명변수가 랜덤인 성형 프로파일 연구)

  • Kim, Daeun;Lee, Sungim;Lim, Johan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.335-346
    • /
    • 2022
  • Profile control chart aims to detect a change in the functional relationship of multivariate characteristics in the statistical process control. In monitoring two variables, a linear profile is of interest composed of the intercept and slope of one variable (response variable) against the other (explanatory variable). The previous studies on monitoring of the linear profile mostly assume that the explanatory variables are the same for all profiles. However, there are also cases where they vary depending on profiles. This paper intends to extend the monitoring method to where explanatory variables are different for each profile. We compare the new method's performance through simulation and apply it to monitoring a network intrusion using NSL-KDD data.

Hybrid Learning-Based Cell Morphology Profiling Framework for Classifying Cancer Heterogeneity (암의 이질성 분류를 위한 하이브리드 학습 기반 세포 형태 프로파일링 기법)

  • Min, Chanhong;Jeong, Hyuntae;Yang, Sejung;Shin, Jennifer Hyunjong
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.5
    • /
    • pp.232-240
    • /
    • 2021
  • Heterogeneity in cancer is the major obstacle for precision medicine and has become a critical issue in the field of a cancer diagnosis. Many attempts were made to disentangle the complexity by molecular classification. However, multi-dimensional information from dynamic responses of cancer poses fundamental limitations on biomolecular marker-based conventional approaches. Cell morphology, which reflects the physiological state of the cell, can be used to track the temporal behavior of cancer cells conveniently. Here, we first present a hybrid learning-based platform that extracts cell morphology in a time-dependent manner using a deep convolutional neural network to incorporate multivariate data. Feature selection from more than 200 morphological features is conducted, which filters out less significant variables to enhance interpretation. Our platform then performs unsupervised clustering to unveil dynamic behavior patterns hidden from a high-dimensional dataset. As a result, we visualize morphology state-space by two-dimensional embedding as well as representative morphology clusters and trajectories. This cell morphology profiling strategy by hybrid learning enables simplification of the heterogeneous population of cancer.

Optimizing shallow foundation design: A machine learning approach for bearing capacity estimation over cavities

  • Kumar Shubham;Subhadeep Metya;Abdhesh Kumar Sinha
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.629-641
    • /
    • 2024
  • The presence of excavations or cavities beneath the foundations of a building can have a significant impact on their stability and cause extensive damage. Traditional methods for calculating the bearing capacity and subsidence of foundations over cavities can be complex and time-consuming, particularly when dealing with conditions that vary. In such situations, machine learning (ML) and deep learning (DL) techniques provide effective alternatives. This study concentrates on constructing a prediction model based on the performance of ML and DL algorithms that can be applied in real-world settings. The efficacy of eight algorithms, including Regression Analysis, k-Nearest Neighbor, Decision Tree, Random Forest, Multivariate Regression Spline, Artificial Neural Network, and Deep Neural Network, was evaluated. Using a Python-assisted automation technique integrated with the PLAXIS 2D platform, a dataset containing 272 cases with eight input parameters and one target variable was generated. In general, the DL model performed better than the ML models, and all models, except the regression models, attained outstanding results with an R2 greater than 0.90. These models can also be used as surrogate models in reliability analysis to evaluate failure risks and probabilities.

Association between Lymphovascular Invasion and Recurrence in Patients with pT1N+ or pT2-3N0 Gastric Cancer: a Multi-institutional Dataset Analysis

  • Fujita, Keizo;Kanda, Mitsuro;Ito, Seiji;Mochizuki, Yoshinari;Teramoto, Hitoshi;Ishigure, Kiyoshi;Murai, Toshifumi;Asada, Takahiro;Ishiyama, Akiharu;Matsushita, Hidenobu;Tanaka, Chie;Kobayashi, Daisuke;Fujiwara, Michitaka;Murotani, Kenta;Kodera, Yasuhiro
    • Journal of Gastric Cancer
    • /
    • v.20 no.1
    • /
    • pp.41-49
    • /
    • 2020
  • Purpose: Patients with pathological stage T1N+ or T2-3N0 gastric cancer may experience disease recurrence following curative gastrectomy. However, the current Japanese Gastric Cancer Treatment Guidelines do not recommend postoperative adjuvant chemotherapy for such patients. This study aimed to identify the prognostic factors for patients with pT1N+ or pT2-3N0 gastric cancer using a multi-institutional dataset. Materials and Methods: We retrospectively analyzed the data obtained from 401 patients with pT1N+ or pT2-3N0 gastric cancer who underwent curative gastrectomy at 9 institutions between 2010 and 2014. Results: Of the 401 patients assessed, 24 (6.0%) experienced postoperative disease recurrence. Multivariate analysis revealed that age ≥70 years (hazard ratio [HR], 2.62; 95% confidence interval [CI], 1.09-7.23; P=0.030) and lymphatic and/or venous invasion (lymphovascular invasion (LVI): HR, 7.88; 95% CI, 1.66-140.9; P=0.005) were independent prognostic factors for poor recurrence-free survival. There was no significant association between LVI and the site of initial recurrence. Conclusions: LVI is an indicator of poor prognosis in patients with pT1N+ or pT2-3N0 gastric cancer.

Parameter Regionalization of Semi-Distributed Runoff Model Using Multivariate Statistical Analysis (다변량 통계분석을 이용한 준분포형 유출모형 매개변수 지역화)

  • Lee, Byong-Ju;Jung, Il-Won;Bae, Deg-Hyo
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.2
    • /
    • pp.149-160
    • /
    • 2009
  • The objective of this study is to suggest parameter regionalization scheme which is integrated two multivariate statistical methods: principal components analysis(PCA) and hierarchical cluster analysis(HCA). This technique is to apply semi-distributed rainfall-runoff model on ungauged catchments. 7 catchment characteristics (area, mean altitude, mean slope, ratio of forest, water content at saturation, field capacity and wilting point) are estimated for 109 mid-sized sub-basins. The first two components from PCA results account for 82.11% of the total variance in the dataset. Component 1 is related to the location of the catchments relevant to the altitude and Component 2 is connected with the area of these. 103 ungauged catchments are clustered using HCA as the following 6 groups: Goesan 23, Andong 6, Imha 5, Hapcheon 21, Yongdam 4, Seomjin 44. SWAT model is used to simulate runoff and the parameters of the model on the 6 gauged basins are estimated. The model parameters were regionalized for Soyang, Chungju and Daecheong dam basins which are assumed as ungauged ones. The model efficiency coefficients of the simulated inflows for these three dams were at least 0.8. These results also mean that goodness of fit is high to the observed inflows. This research will contribute to estimate and analyze hydrologic components on the ungauged catchments.