• Title/Summary/Keyword: K-NN Classification Model

Search Result 56, Processing Time 0.025 seconds

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition (I-벡터 기반 오픈세트 언어 인식을 위한 다중 판별 DNN)

  • Kang, Woo Hyun;Cho, Won Ik;Kang, Tae Gyoon;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.8
    • /
    • pp.958-964
    • /
    • 2016
  • In this paper, we propose an i-vector based language recognition system to identify the spoken language of the speaker, which uses multiple discriminative deep neural network (DNN) models analogous to the multi-class support vector machine (SVM) classification system. The proposed model was trained and tested using the i-vectors included in the NIST 2015 i-vector Machine Learning Challenge database, and shown to outperform the conventional language recognition methods such as cosine distance, SVM and softmax NN classifier in open-set experiments.

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
    • Journal of Information Processing Systems
    • /
    • v.10 no.2
    • /
    • pp.314-333
    • /
    • 2014
  • When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

Customer Relationship Management in Telecom Market using an Optimized Case-based Reasoning (최적화 사례기반추론을 이용한 통신시장 고객관계관리)

  • An, Hyeon-Cheol;Kim, Gyeong-Jae
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.285-288
    • /
    • 2006
  • Most previous studies on improving the effectiveness of CBR have focused on the similarity function aspect or optimization of case features and their weights. However, according to some of the prior research, finding the optimal k parameter for the k-nearest neighbor (k-NN) is also crucial for improving the performance of the CBR system. Nonetheless, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors that combine, as well as the weight of each feature. The new model is applied to the real-world case of a major telecommunication company in Korea in order to build the prediction model for the customer profitability level. Experimental results show that our GA-optimized CBR approach outperforms other AI techniques for this mulriclass classification problem.

  • PDF

Feature Analysis of Multi-Channel Time Series EEG Based on Incremental Model (점진적 모델에 기반한 다채널 시계열 데이터 EEG의 특징 분석)

  • Kim, Sun-Hee;Yang, Hyung-Jeong;Ng, Kam Swee;Jeong, Jong-Mun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.63-70
    • /
    • 2009
  • BCI technology is to control communication systems or machines by brain signal among biological signals followed by signal processing. For the implementation of BCI systems, it is required that the characteristics of brain signal are learned and analyzed in real-time and the learned characteristics are applied. In this paper, we detect feature vector of EEG signal on left and right hand movements based on incremental approach and dimension reduction using the detected feature vector. In addition, we show that the reduced dimension can improve the classification performance by removing unnecessary features. The processed data including sufficient features of input data can reduce the time of processing and boost performance of classification by removing unwanted features. Our experiments using K-NN classifier show the proposed approach 5% outperforms the PCA based dimension reduction.

Machine Learning Using Template-Based-Predicted Structure of Haemagglutinin Predicts Pathogenicity of Avian Influenza

  • Jong Hyun Shin;Sun Ju Kim;Gwanghun Kim;Hang-Rae Kim;Kwan Soo Ko
    • Journal of Microbiology and Biotechnology
    • /
    • v.34 no.10
    • /
    • pp.2033-2040
    • /
    • 2024
  • Deep learning presents a promising approach to complex biological classifications, contingent upon the availability of well-curated datasets. This study addresses the challenge of analyzing three-dimensional protein structures by introducing a novel pipeline that utilizes open-source tools to convert protein structures into a format amenable to computational analysis. Applying a two-dimensional convolutional neural network (CNN) to a dataset of 12,143 avian influenza virus genomes from 64 countries, encompassing 119 hemagglutinin (HA) and neuraminidase (NA) types, we achieved significant classification accuracy. The pathogenicity was determined based on the presence of H5 or H7 subtypes, and our models, ranging from zero to six mid-layers, indicated that a four-layer model most effectively identified highly pathogenic strains, with accuracies over 0.9. To enhance our approach, we incorporated Principal Component Analysis (PCA) for dimensionality reduction and one-class SVM for abnormality detection, improving model robustness through bootstrapping. Furthermore, the K-nearest neighbor (K-NN) algorithm was fine-tuned via hyperparameter optimization to corroborate the findings. The PCA identified distinct clustering for pathogenic HA, yielding an AUC of up to 0.85. The optimized K-NN model demonstrated an impressive accuracy between 0.96 and 0.97. These combined methodologies underscore our deep learning framework's capacity for rapid and precise identification of pathogenic avian influenza strains, thus providing a critical tool for managing global avian influenza threats.

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

  • Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.1
    • /
    • pp.84-89
    • /
    • 2019
  • This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.

Morphological Variation Classification of Red Blood Cells using Neural Network Model in the Peripheral Blood Images (말초혈액영상에서 신경망 모델을 이용한 적혈구의 형태학적 변이 분류)

  • Kim, Gyeong-Su;Kim, Pan-Gu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.10
    • /
    • pp.2707-2715
    • /
    • 1999
  • Recently, there have been researches to automate processing and analysing images in the medical field using image processing technique, a fast communication network, and high performance hardware. In this paper, we propose a system to be able to analyze morphological abnormality of red-blood cells for peripheral blood image using image processing techniques. To do this, we segment red-blood cells in the blood image acquired from microscope with CCD camera and then extract UNL fourier features to classify them into 15 classes. We reduce the number of multi-variate features using PCA to construct a more efficient classifier. Our system has the best performance in recognition rate, compared with two other algorithms, LVQ3 and k-NN. So, we show that it can be applied to a pathological guided system.

  • PDF

Robust Real-time Tracking of Facial Features with Application to Emotion Recognition (안정적인 실시간 얼굴 특징점 추적과 감정인식 응용)

  • Ahn, Byungtae;Kim, Eung-Hee;Sohn, Jin-Hun;Kweon, In So
    • The Journal of Korea Robotics Society
    • /
    • v.8 no.4
    • /
    • pp.266-272
    • /
    • 2013
  • Facial feature extraction and tracking are essential steps in human-robot-interaction (HRI) field such as face recognition, gaze estimation, and emotion recognition. Active shape model (ASM) is one of the successful generative models that extract the facial features. However, applying only ASM is not adequate for modeling a face in actual applications, because positions of facial features are unstably extracted due to limitation of the number of iterations in the ASM fitting algorithm. The unaccurate positions of facial features decrease the performance of the emotion recognition. In this paper, we propose real-time facial feature extraction and tracking framework using ASM and LK optical flow for emotion recognition. LK optical flow is desirable to estimate time-varying geometric parameters in sequential face images. In addition, we introduce a straightforward method to avoid tracking failure caused by partial occlusions that can be a serious problem for tracking based algorithm. Emotion recognition experiments with k-NN and SVM classifier shows over 95% classification accuracy for three emotions: "joy", "anger", and "disgust".

Building Domain Ontology through Concept and Relation Classification (개념 및 관계 분류를 통한 분야 온톨로지 구축)

  • Huang, Jin-Xia;Shin, Ji-Ae;Choi, Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.562-571
    • /
    • 2008
  • For the purpose of building domain ontology, this paper proposes a methodology for building core ontology first, and then enriching the core ontology with the concepts and relations in the domain thesaurus. First, the top-level concept taxonomy of the core ontology is built using domain dictionary and general domain thesaurus. Then, the concepts of the domain thesaurus are classified into top-level concepts in the core ontology, and relations between broader terms (BT) - narrower terms (NT) and related terms (RT) are classified into semantic relations defined for the core ontology. To classify concepts, a two-step approach is adopted, in which a frequency-based approach is complemented with a similarity-based approach. To classify relations, two techniques are applied: (i) for the case of insufficient training data, a rule-based module is for identifying isa relation out of non-isa ones; a pattern-based approach is for classifying non-taxonomic semantic relations from non-isa. (ii) For the case of sufficient training data, a maximum-entropy model is adopted in the feature-based classification, where k-NN approach is for noisy filtering of training data. A series of experiments show that performances of the proposed systems are quite promising and comparable to judgments by human experts.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF