• Title/Summary/Keyword: Classification Variables

Search Result 921, Processing Time 0.036 seconds

Classification of National Highway by Factor Analysis (요인분석을 활용한 일반국도 유형분류)

  • Lim, Sung-Han;Ha, Jung-A;Oh, Ju-Sam
    • International Journal of Highway Engineering
    • /
    • v.7 no.3 s.25
    • /
    • pp.43-52
    • /
    • 2005
  • Highway classification is an essential part of defining design criteria of roads. This study is to classify highways by factor analysis. To accomplish the objectives, factor analysis is performed for classifying highways using the traffic data observed at the permanent traffic count points in 2004. A total off variables are applied : AADT, K factor, D factor, heavy vehicle proportion, day time traffic volume proportion, peak hour volume proportion, sunday factor, vacation factor and COV(Coefficient of Variation). The results of factor analysis show that variables are divided into two factors, which are the factor related to the fluctuational characteristics of traffic volume and the factor related to heavy vehicle and directional volume characteristics. According to the results of cluster analysis, 353 permanent traffic count points are categorized into such three groups as type I for urban highway, type II for rural highway, type III for recreational highway, respectively.

  • PDF

Using Artificial Neural Networks for Forecasting Algae Counts in a Surface Water System

  • Coppola, Emery A. Jr.;Jacinto, Adorable B.;Atherholt, Tom;Poulton, Mary;Pasquarello, Linda;Szidarvoszky, Ferenc;Lohbauer, Scott
    • Korean Journal of Ecology and Environment
    • /
    • v.46 no.1
    • /
    • pp.1-9
    • /
    • 2013
  • Algal blooms in potable water supplies are becoming an increasingly prevalent and serious water quality problem around the world. In addition to precipitating taste and odor problems, blooms damage the environment, and some classes like cyanobacteria (blue-green algae) release toxins that can threaten human health, even causing death. There is a recognized need in the water industry for models that can accurately forecast in real-time algal bloom events for planning and mitigation purposes. In this study, using data for an interconnected system of rivers and reservoirs operated by a New Jersey water utility, various ANN models, including both discrete prediction and classification models, were developed and tested for forecasting counts of three different algal classes for one-week and two-weeks ahead periods. Predictor model inputs included physical, meteorological, chemical, and biological variables, and two different temporal schemes for processing inputs relative to the prediction event were used. Despite relatively limited historical data, the discrete prediction ANN models generally performed well during validation, achieving relatively high correlation coefficients, and often predicting the formation and dissipation of high algae count periods. The ANN classification models also performed well, with average classification percentages averaging 94 percent accuracy. Despite relatively limited data events, this study demonstrates that with adequate data collection, both in terms of the number of historical events and availability of important predictor variables, ANNs can provide accurate real-time forecasts of algal population counts, as well as foster increased understanding of important cause and effect relationships, which can be used to both improve monitoring programs and forecasting efforts.

A Study on Forest Land Classification Using Multivariate Statistical Methods : A Case Study at Mt. Kwanak (다변수통계방법을 이용한 산지분류에 관한 연구)

  • 정순오
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.13 no.1
    • /
    • pp.43-66
    • /
    • 1985
  • Korea needs proper and rational public policies on conservation and use of forest land and other natural resources because of the accelerating expansion of national land developments in recent years. Unfortunately, there is no systematic planning system to support the needs. Generally, forest land use planning needs suitability analysis based on efficient land classification system. The goal of this study was to classify a forest land using multivariate satistical methods. A case study was carried out in winter of 1983 on a mountainous area higher than 100m above sea level located at Mt. Kwanak in Anyang -city, Kyung-gi-do (province). The study area was 19.80 km$^2$wide and was divided into 1, 383 Operational Taxonomic Units (OTU's) by a 120m$\times$120m grid. Fourteen descriptors were identified and quantified for each OTU from existing national land data : elevation, slope, aspect, terrain form, geologic material, surface soil permeability, topsoil type, depth of the solum, soil acidity, forest cover type, stand size class, stand age class, stand density class, and simple forest soil capability class. For this study, a FORTRAN IV program was written for input and output map data, and the computer statistics packages, SPSS and BMD, were used to perform the multivariate statistical analysis. Fourteen variables were analyzed to investigate the characteristics of their fire quench distribution and to estimate the correlation coefficients among them. Principal component analysis was executed to find the dimensions of forest land characteristics, and factor scores were used for proper samples of OTU throughout the study area. In order to develop the classes of forest land classification based on 102 surrogates, cluster and discriminant analyses of principal descriptor variable matrix were undertaken. Results obtained through a series of multivariate statistical analyses were as follows ; 1) Principal component analysis was proved to be a useful tool for data selection and identification of principal descriptor variables which represented the characteristics of forest land and facilitated the selection of samples.

  • PDF

Characteristics Detection of Hydrological and Water Quality Data in Jangseong Reservoir by Application of Pattern Classification Method (패턴분류 방법 적용에 의한 장성호 수문·수질자료의 특성파악)

  • Park, Sung-Chun;Jin, Young-Hoon;Roh, Kyong-Bum;Kim, Jongo;Yu, Ho-Gyu
    • Journal of Korean Society on Water Environment
    • /
    • v.27 no.6
    • /
    • pp.794-803
    • /
    • 2011
  • Self Organizing Map (SOM) was applied for pattern classification of hydrological and water quality data measured at Jangseong Reservoir on a monthly basis. The primary objective of the present study is to understand better data characteristics and relationship between the data. For the purpose, two SOMs were configured by a methodologically systematic approach with appropriate methods for data transformation, determination of map size and side lengths of the map. The SOMs constructed at the respective measurement stations for water quality data (JSD1 and JSD2) commonly classified the respective datasets into five clusters by Davies-Bouldin Index (DBI). The trained SOMs were fine-tuned by Ward's method of a hierarchical cluster analysis. On the one hand, the patterns with high values of standardized reference vectors for hydrological variables revealed the high possibility of eutrophication by TN or TP in the reservoir, in general. On the other hand, the clusters with low values of standardized reference vectors for hydrological variables showed the patterns with high COD concentration. In particular, Clsuter1 at JSD1 and Cluster5 at JSD2 represented the worst condition of water quality with high reference vectors for rainfall and storage in the reservoir. Consequently, SOM is applicable to identify the patterns of potential eutrophication in reservoirs according to the better understanding of data characteristics and their relationship.

Unsupervised one-class classification for condition assessment of bridge cables using Bayesian factor analysis

  • Wang, Xiaoyou;Li, Lingfang;Tian, Wei;Du, Yao;Hou, Rongrong;Xia, Yong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • Cables are critical components of cable-stayed bridges. A structural health monitoring system provides real-time cable tension recording for cable health monitoring. However, the measurement data involve multiple sources of variability, i.e., varying environmental and operational factors, which increase the complexity of cable condition monitoring. In this study, a one-class classification method is developed for cable condition assessment using Bayesian factor analysis (FA). The single-peaked vehicle-induced cable tension is assumed to be relevant to vehicle positions and weights. The Bayesian FA is adopted to establish the correlation model between cable tensions and vehicles. Vehicle weights are assumed to be latent variables and the influences of different transverse positions are quantified by coefficient parameters. The Bayesian theorem is employed to estimate the parameters and variables automatically, and the damage index is defined on the basis of the well-trained model. The proposed method is applied to one cable-stayed bridge for cable damage detection. Significant deviations of the damage indices of Cable SJS11 were observed, indicating a damaged condition in 2011. This study develops a novel method to evaluate the health condition of individual cable using the FA in the Bayesian framework. Only vehicle-induced cable tensions are used and there is no need to monitor the vehicles. The entire process, including the data pre-processing, model training and damage index calculation of one cable, takes only 35 s, which is highly efficient.

Classification Abnormal temperatures based on Meteorological Environment using Random forests (랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류)

  • Youn Su Kim;Kwang Yoon Song;In Hong Chang
    • Journal of Integrative Natural Science
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

Analysis on the Plant Community Structure of Chundong Valley in Sobaeksan National Park (소백산국립공원 천동계곡의 식물군집구조분석)

  • Lee, Kyong-Jae;Cho, Woo;Jo, Jae-Chang
    • Korean Journal of Environment and Ecology
    • /
    • v.6 no.2
    • /
    • pp.134-146
    • /
    • 1993
  • A survey of the Chundong valley forest in Mt. Sobaek was conducted using 20 sample plots of 500$m^2$ size. The classification by TWINSPAN and DCA, CCA ordination techniques were applied to the study area in order to classify them into several groups based on woody plants and environmental variables. By TWINSPAN techniques, the plant community were divided into four groups. The dividing groups are Pinus densiflora community, Qurcus variabilis -Q. mongolica -P. densiflora community, Fraxinus rhynchophylla community. The successional trends of tree species by DCA ordination techniques and DBH class distribution analysis seems to be from P. densiflora through Q. mongolica. Q. variabilis to F rhynchophylla. The correlation between the score of first two axes and soil pH, soil humus, soil calcium concentration, soil magnesium concentration was significantly positive in CCA ordination. The positive correlation between the score of first two axes of F. rhynchophylla community and soil humus, soil magnesium concentration and between ones of P. densiflora community and soil pH were calculated. But there is no correlation between species and environmental variables.

  • PDF

A Study on NOx Emission Control Methods in the Cement Firing Process Using Data Mining Techniques (데이터 마이닝을 이용한 시멘트 소성공정 질소산화물(NOx)배출 관리 방법에 관한 연구)

  • Park, Chul Hong;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.3
    • /
    • pp.739-752
    • /
    • 2018
  • Purpose: The purpose of this study was to investigate the relationship between kiln processing parameters and NOx emissions that occur in the sintering and calcination steps of the cement manufacturing process and to derive the main factors responsible for producing emissions outside emission limit criteria, as determined by category models and classification rules, using data mining techniques. The results from this study are expected to be useful as guidelines for NOx emission control standards. Methods: Data were collected from Precalciner Kiln No.3 used in one of the domestic cement plants in Korea. Thirty-four independent variables affecting NOx generation and dependent variables that exceeded or were below the NOx emiision limit (>1 and <0, respectively) were examined during kiln processing. These data were used to construct a detection model of NOx emission, in which emissions exceeded or were below the set limits. The model was validated using SPSS MODELER 18.0, artificial neural network, decision treee (C5.0), and logistic regression analysis data mining techniques. Results: The decision tree (C5.0) algorithm best represented NOx emission behavior and was used to identify 10 processing variables that resulted in NOx emissions outside limit criteria. Conclusion: The results of this study indicate that the decision tree (C5.0) can be applied for real-time monitoring and management of NOx emissions during the cement firing process to satisfy NOx emission control standards and to provide for a more eco-friendly cement product.

Geostatistical Simulation of Compositional Data Using Multiple Data Transformations (다중 자료 변환을 이용한 구성 자료의 지구통계학적 시뮬레이션)

  • Park, No-Wook
    • Journal of the Korean earth science society
    • /
    • v.35 no.1
    • /
    • pp.69-87
    • /
    • 2014
  • This paper suggests a conditional simulation framework based on multiple data transformations for geostatistical simulation of compositional data. First, log-ratio transformation is applied to original compositional data in order to apply conventional statistical methodologies. As for the next transformations that follow, minimum/maximum autocorrelation factors (MAF) and indicator transformations are sequentially applied. MAF transformation is applied to generate independent new variables and as a result, an independent simulation of individual variables can be applied. Indicator transformation is also applied to non-parametric conditional cumulative distribution function modeling of variables that do not follow multi-Gaussian random function models. Finally, inverse transformations are applied in the reverse order of those transformations that are applied. A case study with surface sediment compositions in tidal flats is carried out to illustrate the applicability of the presented simulation framework. All simulation results satisfied the constraints of compositional data and reproduced well the statistical characteristics of the sample data. Through surface sediment classification based on multiple simulation results of compositions, the probabilistic evaluation of classification results was possible, an evaluation unavailable in a conventional kriging approach. Therefore, it is expected that the presented simulation framework can be effectively applied to geostatistical simulation of various compositional data.