• Title/Summary/Keyword: Classification Variables

Search Result 921, Processing Time 0.069 seconds

머신러닝 기법을 활용한 대용량 시계열 데이터 이상 시점탐지 방법론 : 발전기 부품신호 사례 중심 (Anomaly Detection of Big Time Series Data Using Machine Learning)

  • 권세혁
    • 산업경영시스템학회지
    • /
    • 제43권2호
    • /
    • pp.33-38
    • /
    • 2020
  • Anomaly detection of Machine Learning such as PCA anomaly detection and CNN image classification has been focused on cross-sectional data. In this paper, two approaches has been suggested to apply ML techniques for identifying the failure time of big time series data. PCA anomaly detection to identify time rows as normal or abnormal was suggested by converting subjects identification problem to time domain. CNN image classification was suggested to identify the failure time by re-structuring of time series data, which computed the correlation matrix of one minute data and converted to tiff image format. Also, LASSO, one of feature selection methods, was applied to select the most affecting variables which could identify the failure status. For the empirical study, time series data was collected in seconds from a power generator of 214 components for 25 minutes including 20 minutes before the failure time. The failure time was predicted and detected 9 minutes 17 seconds before the failure time by PCA anomaly detection, but was not detected by the combination of LASSO and PCA because the target variable was binary variable which was assigned on the base of the failure time. CNN image classification with the train data of 10 normal status image and 5 failure status images detected just one minute before.

공정측정데이터의 비선형표현과 전처리를 활용한 분류기반 진단 (Diagnostic Classification Based on Nonlinear Representation and Filtering of Process Measurement Data)

  • 조현우
    • 한국산학기술학회논문지
    • /
    • 제16권5호
    • /
    • pp.3000-3005
    • /
    • 2015
  • 신뢰할 수 있는 공정 감시와 진단은 생산 공정의 안전과 최종제품의 품질을 보장이라는 관점에서 중요하다. 공정진단의 목적은 특정한 공정 이상의 원인을 밝혀내는 것이다. 본 연구에서는 분류기법에 기반한 공정진단 체계를 제시한다. 여기서는 공정데이터를 비선형 데이터 표현기법을 통해 변환함으로써 데이터의 크기를 줄이며 효율적인 데이터 표현이 가능하다. 추가적인 단계로서 공정 데이터의 전처리 과정을 통해 진단에 무관한 공정 패턴을 제거하고 진단 성능을 높이고자 한다. 진단 성능을 평가하기 위해 회분식 공정에 대한 사례연구를 수행한 결과 기존 선형 진단 방법론 및 전처리 과정이 없는 방법론에 비해 향상된 진단 결과를 얻을 수 있었다.

유형화기법에 의한 농촌지역개발범역 설정방향모색 - 리/읍.면 단위지역의 지역특성 규명을 중심으로 - (An Approach on the Spatial Boundary of Rural Development Project by Areal Classification Technique - With Spatial Reference to Searching of Areal Homogeneities in Two Hierachial Administrative Units, Ri, Eup/Myun -)

  • 전영길;류수형
    • 농촌계획
    • /
    • 제4권2호
    • /
    • pp.128-137
    • /
    • 1998
  • The objective of this study is to approach on the spatial boundary of rural development protect by areal classification technique with spatial reference to searching of areal homogeneities in two hierachial administrative units, Ri Eup/Myun. In this study, a criterion for judging areal homogeneities is the degree of agriculture and urbanizing. Variables selected by these two criteria are analysed with the method of fator analysis. The results of areal analysis are as follows: first, generally, the importance of agricultural factors in areal analysis is getting less. Second, areal classification by Myun, Ri in Ansong City is revealed variously because of urban factors. Urban factors make areal heterogeneities become greater, Therefore urban factors are important when analyzing areal characteristics. Third, lately, in areas near by Chung- cheong Do and areas with bad road's condition, areal heterogeneities have been also getting greater. The results of analysis about areal characteristics of Myun and Ri are different from each other. In addition, urban factors are more influential on the areal characteristics than agricultural factors. Therefore, the establishment of rural development project for inindle spatial boundary between Myun unit and Ri unit is needed.

  • PDF

Opera Clustering: K-means on librettos datasets

  • 정하림;유주헌
    • 인터넷정보학회논문지
    • /
    • 제23권2호
    • /
    • pp.45-52
    • /
    • 2022
  • With the development of artificial intelligence analysis methods, especially machine learning, various fields are widely expanding their application ranges. However, in the case of classical music, there still remain some difficulties in applying machine learning techniques. Genre classification or music recommendation systems generated by deep learning algorithms are actively used in general music, but not in classical music. In this paper, we attempted to classify opera among classical music. To this end, an experiment was conducted to determine which criteria are most suitable among, composer, period of composition, and emotional atmosphere, which are the basic features of music. To generate emotional labels, we adopted zero-shot classification with four basic emotions, 'happiness', 'sadness', 'anger', and 'fear.' After embedding the opera libretto with the doc2vec processing model, the optimal number of clusters is computed based on the result of the elbow method. Decided four centroids are then adopted in k-means clustering to classify unsupervised libretto datasets. We were able to get optimized clustering based on the result of adjusted rand index scores. With these results, we compared them with notated variables of music. As a result, it was confirmed that the four clusterings calculated by machine after training were most similar to the grouping result by period. Additionally, we were able to verify that the emotional similarity between composer and period did not appear significantly. At the end of the study, by knowing the period is the right criteria, we hope that it makes easier for music listeners to find music that suits their tastes.

Study of oversampling algorithms for soil classifications by field velocity resistivity probe

  • Lee, Jong-Sub;Park, Junghee;Kim, Jongchan;Yoon, Hyung-Koo
    • Geomechanics and Engineering
    • /
    • 제30권3호
    • /
    • pp.247-258
    • /
    • 2022
  • A field velocity resistivity probe (FVRP) can measure compressional waves, shear waves and electrical resistivity in boreholes. The objective of this study is to perform the soil classification through a machine learning technique through elastic wave velocity and electrical resistivity measured by FVRP. Field and laboratory tests are performed, and the measured values are used as input variables to classify silt sand, sand, silty clay, and clay-sand mixture layers. The accuracy of k-nearest neighbors (KNN), naive Bayes (NB), random forest (RF), and support vector machine (SVM), selected to perform classification and optimize the hyperparameters, is evaluated. The accuracies are calculated as 0.76, 0.91, 0.94, and 0.88 for KNN, NB, RF, and SVM algorithms, respectively. To increase the amount of data at each soil layer, the synthetic minority oversampling technique (SMOTE) and conditional tabular generative adversarial network (CTGAN) are applied to overcome imbalance in the dataset. The CTGAN provides improved accuracy in the KNN, NB, RF and SVM algorithms. The results demonstrate that the measured values by FVRP can classify soil layers through three kinds of data with machine learning algorithms.

Differentiation among stability regimes of alumina-water nanofluids using smart classifiers

  • Daryayehsalameh, Bahador;Ayari, Mohamed Arselene;Tounsi, Abdelouahed;Khandakar, Amith;Vaferi, Behzad
    • Advances in nano research
    • /
    • 제12권5호
    • /
    • pp.489-499
    • /
    • 2022
  • Nanofluids have recently triggered a substantial scientific interest as cooling media. However, their stability is challenging for successful engagement in industrial applications. Different factors, including temperature, nanoparticles and base fluids characteristics, pH, ultrasonic power and frequency, agitation time, and surfactant type and concentration, determine the nanofluid stability regime. Indeed, it is often too complicated and even impossible to accurately find the conditions resulting in a stabilized nanofluid. Furthermore, there are no empirical, semi-empirical, and even intelligent scenarios for anticipating the stability of nanofluids. Therefore, this study introduces a straightforward and reliable intelligent classifier for discriminating among the stability regimes of alumina-water nanofluids based on the Zeta potential margins. In this regard, various intelligent classifiers (i.e., deep learning and multilayer perceptron neural network, decision tree, GoogleNet, and multi-output least squares support vector regression) have been designed, and their classification accuracy was compared. This comparison approved that the multilayer perceptron neural network (MLPNN) with the SoftMax activation function trained by the Bayesian regularization algorithm is the best classifier for the considered task. This intelligent classifier accurately detects the stability regimes of more than 90% of 345 different nanofluid samples. The overall classification accuracy and misclassification percent of 90.1% and 9.9% have been achieved by this model. This research is the first try toward anticipting the stability of water-alumin nanofluids from some easily measured independent variables.

의사결정나무를 활용한 온라인 소비자 리뷰 평가에 영향을 주는 핵심 키워드 도출 연구: 별점과 좋아요를 중심으로 (Core Keywords Extraction forEvaluating Online Consumer Reviews Using a Decision Tree: Focusing on Star Ratings and Helpfulness Votes)

  • 민경수;유동희
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제32권3호
    • /
    • pp.133-150
    • /
    • 2023
  • Purpose This study aims to develop classification models using a decision tree algorithm to identify core keywords and rules influencing online consumer review evaluations for the robot vacuum cleaner on Amazon.com. The difference from previous studies is that we analyze core keywords that affect the evaluation results by dividing the subjects that evaluate online consumer reviews into self-evaluation (star ratings) and peer evaluation (helpfulness votes). We investigate whether the core keywords influencing star ratings and helpfulness votes vary across different products and whether there is a similarity in the core keywords related to star ratings or helpfulness votes across all products. Design/methodology/approach We used random under-sampling to balance the dataset. We progressively removed independent variables based on decreasing importance through backwards elimination to evaluate the classification model's performance. As a result, we identified classification models that best predict star ratings and helpfulness votes for each product's online consumer reviews. Findings We have identified that the core keywords influencing self-evaluation and peer evaluation vary across different products, and even for the same model or features, the core keywords are not consistent. Therefore, companies' producers and marketing managers need to analyze the core keywords of each product to highlight the advantages and prepare customized strategies that compensate for the shortcomings.

기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구 (Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column)

  • 김수빈;오근영;신지욱
    • 한국지진공학회논문집
    • /
    • 제28권2호
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

확률변수를 이용한 음악에 따른 감정분석에의 최적 EEG 채널 선택 (A Selection of Optimal EEG Channel for Emotion Analysis According to Music Listening using Stochastic Variables)

  • 변성우;이소민;이석필
    • 전기학회논문지
    • /
    • 제62권11호
    • /
    • pp.1598-1603
    • /
    • 2013
  • Recently, researches on analyzing relationship between the state of emotion and musical stimuli are increasing. In many previous works, data sets from all extracted channels are used for pattern classification. But these methods have problems in computational complexity and inaccuracy. This paper proposes a selection of optimal EEG channel to reflect the state of emotion efficiently according to music listening by analyzing stochastic feature vectors. This makes EEG pattern classification relatively simple by reducing the number of dataset to process.

HMM-Based Transient Identification in Dynamic Process

  • Kwon, Kee-Choon
    • Transactions on Control, Automation and Systems Engineering
    • /
    • 제2권1호
    • /
    • pp.40-46
    • /
    • 2000
  • In this paper, a transient identification based on a Hidden Markov Model (HMM) has been suggested and evaluated experimentally for the classification of transients in the dynamic process. The transient can be identified by its unique time dependent patterns related to the principal variables. The HMM, a double stochastic process, can be applied to transient identification which is a spatial and temporal classification problem under a statistical pattern recognition framework. The HMM is created for each transient from a set of training data by the maximum-likelihood estimation method. The transient identification is determined by calculating which model has the highest probability for the given test data. Several experimental tests have been performed with normalization methods, clustering algorithms, and a number of states in HMM. Several experimental tests have been performed including superimposing random noise, adding systematic error, and untrained transients. The proposed real-time transient identification system has many advantages, however, there are still a lot of problems that should be solved to apply to a real dynamic process. Further efforts are being made to improve the system performance and robustness to demonstrate reliability and accuracy to the required level.

  • PDF