• Title/Summary/Keyword: feature vector selection

Search Result 184, Processing Time 0.029 seconds

Comparison of Prediction Accuracy Between Classification and Convolution Algorithm in Fault Diagnosis of Rotatory Machines at Varying Speed (회전수가 변하는 기기의 고장진단에 있어서 특성 기반 분류와 합성곱 기반 알고리즘의 예측 정확도 비교)

  • Moon, Ki-Yeong;Kim, Hyung-Jin;Hwang, Se-Yun;Lee, Jang Hyun
    • Journal of Navigation and Port Research
    • /
    • v.46 no.3
    • /
    • pp.280-288
    • /
    • 2022
  • This study examined the diagnostics of abnormalities and faults of equipment, whose rotational speed changes even during regular operation. The purpose of this study was to suggest a procedure that can properly apply machine learning to the time series data, comprising non-stationary characteristics as the rotational speed changes. Anomaly and fault diagnosis was performed using machine learning: k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest. To compare the diagnostic accuracy, an autoencoder was used for anomaly detection and a convolution based Conv1D was additionally used for fault diagnosis. Feature vectors comprising statistical and frequency attributes were extracted, and normalization & dimensional reduction were applied to the extracted feature vectors. Changes in the diagnostic accuracy of machine learning according to feature selection, normalization, and dimensional reduction are explained. The hyperparameter optimization process and the layered structure are also described for each algorithm. Finally, results show that machine learning can accurately diagnose the failure of a variable-rotation machine under the appropriate feature treatment, although the convolution algorithms have been widely applied to the considered problem.

Design of Automatic Document Classifier for IT documents based on SVM (SVM을 이용한 디렉토리 기반 기술정보 문서 자동 분류시스템 설계)

  • Kang, Yun-Hee;Park, Young-B.
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.186-194
    • /
    • 2004
  • Due to the exponential growth of information on the internet, it is getting difficult to find and organize relevant informations. To reduce heavy overload of accesses to information, automatic text classification for handling enormous documents is necessary. In this paper, we describe structure and implementation of a document classification system for web documents. We utilize SVM for documentation classification model that is constructed based on training set and its representative terms in a directory. In our system, SVM is trained and is used for document classification by using word set that is extracted from information and communication related web documents. In addition, we use vector-space model in order to represent characteristics based on TFiDF and training data consists of positive and negative classes that are represented by using characteristic set with weight. Experiments show the results of categorization and the correlation of vector length.

  • PDF

Optimized Bankruptcy Prediction through Combining SVM with Fuzzy Theory (퍼지이론과 SVM 결합을 통한 기업부도예측 최적화)

  • Choi, So-Yun;Ahn, Hyun-Chul
    • Journal of Digital Convergence
    • /
    • v.13 no.3
    • /
    • pp.155-165
    • /
    • 2015
  • Bankruptcy prediction has been one of the important research topics in finance since 1960s. In Korea, it has gotten attention from researchers since IMF crisis in 1998. This study aims at proposing a novel model for better bankruptcy prediction by converging three techniques - support vector machine(SVM), fuzzy theory, and genetic algorithm(GA). Our convergence model is basically based on SVM, a classification algorithm enables to predict accurately and to avoid overfitting. It also incorporates fuzzy theory to extend the dimensions of the input variables, and GA to optimize the controlling parameters and feature subset selection. To validate the usefulness of the proposed model, we applied it to H Bank's non-external auditing companies' data. We also experimented six comparative models to validate the superiority of the proposed model. As a result, our model was found to show the best prediction accuracy among the models. Our study is expected to contribute to the relevant literature and practitioners on bankruptcy prediction.

A Study on the Efficient Feature Vector Extraction for Music Information Retrieval System (음악 정보검색 시스템을 위한 효율적인 특징 벡터 추출에 관한 연구)

  • 윤원중;이강규;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.532-539
    • /
    • 2004
  • In this Paper, we propose a content-based music information retrieval (MIR) system base on the query-by-example (QBE) method. The proposed system is implemented to retrieve queried music from a dataset where 60 music samples were collected for each of the four genres in Classical, Hiphop. Jazz. and Reck. resulting in 240 music files in database. From each query music signal, the system extracts 60 dimensional feature vectors including spectral centroid. rolloff. flux base on STFT and also the LPC. MFCC and Beat information. and retrieves queried music from a trained database set using Euclidean distance measure. In order to choose optimum features from the 60 dimension feature vectors, SFS method is applied to draw 10 dimension optimum features and these are used for the Proposed system. From the experimental result. we can verify the superior performance of the proposed system that provides success rate of 84% in Hit Rate and 0.63 in MRR which means near 10% improvements over the previous methods. Additional experiments regarding system Performance to random query Patterns (or portions) and query lengths have been investigated and a serious instability problem of system Performance is Pointed out.

A Study on Selecting Principle Component Variables Using Adaptive Correlation (적응적 상관도를 이용한 주성분 변수 선정에 관한 연구)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.79-84
    • /
    • 2021
  • A feature extraction method capable of reflecting features well while mainaining the properties of data is required in order to process high-dimensional data. The principal component analysis method that converts high-level data into low-dimensional data and express high-dimensional data with fewer variables than the original data is a representative method for feature extraction of data. In this study, we propose a principal component analysis method based on adaptive correlation when selecting principal component variables in principal component analysis for data feature extraction when the data is high-dimensional. The proposed method analyzes the principal components of the data by adaptively reflecting the correlation based on the correlation between the input data. I want to exclude them from the candidate list. It is intended to analyze the principal component hierarchy by the eigen-vector coefficient value, to prevent the selection of the principal component with a low hierarchy, and to minimize the occurrence of data duplication inducing data bias through correlation analysis. Through this, we propose a method of selecting a well-presented principal component variable that represents the characteristics of actual data by reducing the influence of data bias when selecting the principal component variable.

Gaussian Density Selection Method of CDHMM in Speaker Recognition (화자인식에서 연속밀도 은닉마코프모델의 혼합밀도 결정방법)

  • 서창우;이주헌;임재열;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.711-716
    • /
    • 2003
  • This paper proposes the method to select the number of optimal mixtures in each state in Continuous Density HMM (Hidden Markov Models), Previously, researchers used the same number of mixture components in each state of HMM regardless spectral characteristic of speaker, To model each speaker as accurately as possible, we propose to use a different number of mixture components for each state, Selection of mixture components considered the probability value of mixture by each state that affects much parameter estimation of continuous density HMM, Also, we use PCA (principal component analysis) to reduce the correlation and obtain the system' stability when it is reduced the number of mixture components, We experiment it when the proposed method used average 10% small mixture components than the conventional HMM, When experiment result is only applied selection of mixture components, the proposed method could get the similar performance, When we used principal component analysis, the feature vector of the 16 order could get the performance decrease of average 0,35% and the 25 order performance improvement of average 0.65%.

Performance Improvement of Fast Speaker Adaptation Based on Dimensional Eigenvoice and Adaptation Mode Selection (차원별 Eigenvoice와 화자적응 모드 선택에 기반한 고속화자적응 성능 향상)

  • 송화전;이윤근;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.48-53
    • /
    • 2003
  • Eigenvoice method is known to be adequate for fast speaker adaptation, but it hardly shows additional improvement with increased amount of adaptation data. In this paper, to deal with this problem, we propose a modified method estimating the weights of eigenvoices in each feature vector dimension. We also propose an adaptation mode selection scheme that one method with higher performance among several adaptation methods is selected according to the amount of adaptation data. We used POW DB to construct the speaker independent model and eigenvoices, and utterances(ranging from 1 to 50) from PBW 452 DB and the remaining 400 utterances were used for adaptation and evaluation, respectively. With the increased amount of adaptation data, proposed dimensional eigenvoice method showed higher performance than both conventional eigenvoice method and MLLR. Up to 26% of word error rate was reduced by the adaptation mode selection between eigenvoice and dimensional eigenvoice methods in comparison with conventional eigenvoice method.

Resume Classification System using Natural Language Processing & Machine Learning Techniques

  • Irfan Ali;Nimra;Ghulam Mujtaba;Zahid Hussain Khand;Zafar Ali;Sajid Khan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.108-117
    • /
    • 2024
  • The selection and recommendation of a suitable job applicant from the pool of thousands of applications are often daunting jobs for an employer. The recommendation and selection process significantly increases the workload of the concerned department of an employer. Thus, Resume Classification System using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process and ease the job of an employer. Moreover, the automation of this process can significantly expedite and transparent the applicants' selection process with mere human involvement. Nevertheless, various Machine Learning approaches have been proposed to develop Resume Classification Systems. However, this study presents an automated NLP and ML-based system that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of Resume Classification Systems and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP & ML techniques for processing & classification of Resumes, the extracted features were tested on nine machine learning models Support Vector Machine - SVM (Linear, SGD, SVC & NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN) and Logistic Regression (LR). The Term-Frequency Inverse Document (TF-IDF) feature representation scheme proven suitable for Resume Classification Task. The developed models were evaluated using F-ScoreM, RecallM, PrecissionM, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume Classification task, the SVM class of Machine Learning algorithms performed better on the study dataset with over 96% overall accuracy. The promising results suggest that NLP & ML techniques employed in this study could be used for the Resume Classification task.

Optimizing Clustering and Predictive Modelling for 3-D Road Network Analysis Using Explainable AI

  • Rotsnarani Sethy;Soumya Ranjan Mahanta;Mrutyunjaya Panda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.30-40
    • /
    • 2024
  • Building an accurate 3-D spatial road network model has become an active area of research now-a-days that profess to be a new paradigm in developing Smart roads and intelligent transportation system (ITS) which will help the public and private road impresario for better road mobility and eco-routing so that better road traffic, less carbon emission and road safety may be ensured. Dealing with such a large scale 3-D road network data poses challenges in getting accurate elevation information of a road network to better estimate the CO2 emission and accurate routing for the vehicles in Internet of Vehicle (IoV) scenario. Clustering and regression techniques are found suitable in discovering the missing elevation information in 3-D spatial road network dataset for some points in the road network which is envisaged of helping the public a better eco-routing experience. Further, recently Explainable Artificial Intelligence (xAI) draws attention of the researchers to better interprete, transparent and comprehensible, thus enabling to design efficient choice based models choices depending upon users requirements. The 3-D road network dataset, comprising of spatial attributes (longitude, latitude, altitude) of North Jutland, Denmark, collected from publicly available UCI repositories is preprocessed through feature engineering and scaling to ensure optimal accuracy for clustering and regression tasks. K-Means clustering and regression using Support Vector Machine (SVM) with radial basis function (RBF) kernel are employed for 3-D road network analysis. Silhouette scores and number of clusters are chosen for measuring cluster quality whereas error metric such as MAE ( Mean Absolute Error) and RMSE (Root Mean Square Error) are considered for evaluating the regression method. To have better interpretability of the Clustering and regression models, SHAP (Shapley Additive Explanations), a powerful xAI technique is employed in this research. From extensive experiments , it is observed that SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions with an accuracy of 97.22% and strong performance metrics across all classes having MAE of 0.0346, and MSE of 0.0018. On the other hand, the ten-cluster setup, while faster in SHAP analysis, presented challenges in interpretability due to increased clustering complexity. Hence, K-Means clustering with K=4 and SVM hybrid models demonstrated superior performance and interpretability, highlighting the importance of careful cluster selection to balance model complexity and predictive accuracy.

A Study on the Real-time Recommendation Box Recommendation of Fulfillment Center Using Machine Learning (기계학습을 이용한 풀필먼트센터의 실시간 박스 추천에 관한 연구)

  • Dae-Wook Cha;Hui-Yeon Jo;Ji-Soo Han;Kwang-Sup Shin;Yun-Hong Min
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.149-163
    • /
    • 2023
  • Due to the continuous growth of the E-commerce market, the volume of orders that fulfillment centers have to process has increased, and various customer requirements have increased the complexity of order processing. Along with this trend, the operational efficiency of fulfillment centers due to increased labor costs is becoming more important from a corporate management perspective. Using historical performance data as training data, this study focused on real-time box recommendations applicable to packaging areas during fulfillment center shipping. Four types of data, such as product information, order information, packaging information, and delivery information, were applied to the machine learning model through pre-processing and feature-engineering processes. As an input vector, three characteristics were used as product specification information: width, length, and height, the characteristics of the input vector were extracted through a feature engineering process that converts product information from real numbers to an integer system for each section. As a result of comparing the performance of each model, it was confirmed that when the Gradient Boosting model was applied, the prediction was performed with the highest accuracy at 95.2% when the product specification information was converted into integers in 21 sections. This study proposes a machine learning model as a way to reduce the increase in costs and inefficiency of box packaging time caused by incorrect box selection in the fulfillment center, and also proposes a feature engineering method to effectively extract the characteristics of product specification information.