• Title/Summary/Keyword: data-based model

Search Result 21,096, Processing Time 0.049 seconds

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

CutPaste-Based Anomaly Detection Model using Multi Scale Feature Extraction in Time Series Streaming Data

  • Jeon, Byeong-Uk;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2787-2800
    • /
    • 2022
  • The aging society increases emergency situations of the elderly living alone and a variety of social crimes. In order to prevent them, techniques to detect emergency situations through voice are actively researched. This study proposes CutPaste-based anomaly detection model using multi-scale feature extraction in time series streaming data. In the proposed method, an audio file is converted into a spectrogram. In this way, it is possible to use an algorithm for image data, such as CNN. After that, mutli-scale feature extraction is applied. Three images drawn from Adaptive Pooling layer that has different-sized kernels are merged. In consideration of various types of anomaly, including point anomaly, contextual anomaly, and collective anomaly, the limitations of a conventional anomaly model are improved. Finally, CutPaste-based anomaly detection is conducted. Since the model is trained through self-supervised learning, it is possible to detect a diversity of emergency situations as anomaly without labeling. Therefore, the proposed model overcomes the limitations of a conventional model that classifies only labelled emergency situations. Also, the proposed model is evaluated to have better performance than a conventional anomaly detection model.

Experimental Study on a Monte Carlo-based Recursive Least Square Method for System Identification (몬테카를로 기반 재귀최소자승법에 의한 시스템 인식 실험 연구)

  • Lee, Sang-Deok;Jung, Seul
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.248-254
    • /
    • 2018
  • In this paper, a Monte Carlo-based Recursive Least Square(MC-RLS) method is presented to directly identify the inverse model of the dynamical system. Although a RLS method has been used for the identification based on the deterministic data in the closed loop controlled form, it would be better for RLS to identify the model with random data. In addition, the inverse model obtained by inverting the identified forward model may not work properly. Therefore, MC-RLS can be used for the inverse model identification without proceeding a numerical inversion of an identified forward model. The performance of the proposed method is verified through experimental studies on a control moment gyroscope.

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2E
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

A Design of Content-based Metric Learning Model for HR Matching (인재매칭을 위한 내용기반 척도학습모형의 설계)

  • Song, Hee Seok
    • Journal of Information Technology Applications and Management
    • /
    • v.27 no.6
    • /
    • pp.141-151
    • /
    • 2020
  • The job mismatch between job seekers and SMEs is becoming more and more intensifying with the serious difficulties in youth employment. In this study, a bi-directional content-based metric learning model is proposed to recommend suitable jobs for job seekers and suitable job seekers for SMEs, respectively. The proposed model not only enables bi-directional recommendation, but also enables HR matching without relearning for new job seekers and new job offers. As a result of the experiment, the proposed model showed superior performance in terms of precision, recall, and f1 than the existing collaborative filtering model named NCF+GMF. The proposed model is also confirmed that it is an evolutionary model that improves performance as training data increases.

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

Reliability-based Structural Design Optimization Considering Probability Model Uncertainties - Part 1: Design Method (확률모델 불확실성을 고려한 구조물의 신뢰도 기반 최적설계 - 제1편: 설계 방법)

  • Ok, Seung-Yong;Park, Wonsuk
    • Journal of the Korean Society of Safety
    • /
    • v.27 no.5
    • /
    • pp.148-157
    • /
    • 2012
  • Reliability-based design optimization (RBDO) problem is usually formulated as an optimization problem to minimize an objective function subjected to probabilistic constraint functions which may include deterministic design variables as well as random variables. The challenging task is that, because the probability models of the random variables are often assumed based on limited data, there exists a possibility of selecting inappropriate distribution models and/or model parameters for the random variables, which can often lead to disastrous consequences. In order to select the most appropriate distribution model from the limited observation data as well as model parameters, this study takes into account a set of possible candidate models for the random variables. The suitability of each model is then investigated by employing performance and risk functions. In this regard, this study enables structural design optimization and fitness assessment of the distribution models of the random variables at the same time. As the first paper of a two-part series, this paper describes a new design method considering probability model uncertainties. The robust performance of the proposed method is presented in Part 2. To demonstrate the effectiveness of the proposed method, an example of ten-bar truss structure is considered. The numerical results show that the proposed method can provide the optimal design variables while guaranteeing the most desirable distribution models for the random variables even in case the limited data are only available.

Explainable AI Application for Machine Predictive Maintenance (설명 가능한 AI를 적용한 기계 예지 정비 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.227-233
    • /
    • 2021
  • Predictive maintenance has been one of important applications of data science technology that creates a predictive model by collecting numerous data related to management targeted equipment. It does not predict equipment failure with just one or two signs, but quantifies and models numerous symptoms and historical data of actual failure. Statistical methods were used a lot in the past as this predictive maintenance method, but recently, many machine learning-based methods have been proposed. Such proposed machine learning-based methods are preferable in that they show more accurate prediction performance. However, with the exception of some learning models such as decision tree-based models, it is very difficult to explicitly know the structure of learning models (Black-Box Model) and to explain to what extent certain attributes (features or variables) of the learning model affected the prediction results. To overcome this problem, a recently proposed study is an explainable artificial intelligence (AI). It is a methodology that makes it easy for users to understand and trust the results of machine learning-based learning models. In this paper, we propose an explainable AI method to further enhance the explanatory power of the existing learning model by targeting the previously proposedpredictive model [5] that learned data from a core facility (Hyper Compressor) of a domestic chemical plant that produces polyethylene. The ensemble prediction model, which is a black box model, wasconverted to a white box model using the Explainable AI. The proposed methodology explains the direction of control for the major features in the failure prediction results through the Explainable AI. Through this methodology, it is possible to flexibly replace the timing of maintenance of the machine and supply and demand of parts, and to improve the efficiency of the facility operation through proper pre-control.

Development of a Data-Driven and Physics based Model Linked Simulation Model for Ship Engine Performance Evaluation (선박 엔진 성능평가를 위한 데이터 및 물리 기반 모델 연동 엔진 시뮬레이션 모델 개발)

  • Yonadan Choi;Sungjun Yoon;Byoungill Rhee;Tag Gon Kim;Beomcheol Ham
    • Journal of the Korea Society for Simulation
    • /
    • v.33 no.3
    • /
    • pp.1-11
    • /
    • 2024
  • There are various methods to evaluate performance of a internal combustion engine. However, there have been several limitations for each methods. In this study, to overcome such limitations of a previous method, a data-driven model and physics based model linked simulation model is developed. The representative components of turbocharged engine which participate in running loop of an engine are sorted out. Sorted out components are modeled either by data-driven method or by physics based method. The engine simulator is developed by combining component model using C++ and Python. The convergence of several variables is tested to verify a simulator. Finally, as most variables has shown less than 5% error in comparison between the simulation result and the real engine test result, it is concluded that the simulator is validated. It is expected that the developed simulator could evaluate performance of various engine models with small effort. In addition, it is expected that the developed simulator would play a key role in developing an engine digital twin.

A Data Dictionary for Procurement of Die and Mold Parts Based on PLIB Standard (PLIB에 기반한 전자상거래용 금형부품 데이터 사전의 구축)

  • 조준면;문두환;김흥기;한순흥;류병우
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.3
    • /
    • pp.37-52
    • /
    • 2003
  • ISO 13584 Parts Library (PLIB) standard is making its way into e-business as a norm for classifying products and their characteristics. PLIB is a multi-parts standard, and the Part 42: Methodology for structuring Parts families Provides the information model and design Principles for the data dictionary of parts library or e-catalog. If e-catalog systems are built using a data dictionary that is constructed based on PLIB dictionary data model, many different e-catalog systems can be easily integrated and interoperated. This paper studies the roles and requirements of the data dictionary in e-catalog, and applies the data model and design principles of PLIB Part 42 to construct a data dictionary from the viewpoint of ontology Based on the analysis results, we propose a data dictionary of die and mold parts, and implementat the B2B e-catalog system.

  • PDF