• Title/Summary/Keyword: Learning data set

Search Result 1,114, Processing Time 0.028 seconds

Data Science and Deep Learning in Natural Sciences

  • Cha, Meeyoung
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.56.1-56.1
    • /
    • 2019
  • We are producing and consuming more data than ever before. Massive data allow us to better understand the world around us, yet they bring a new set of challenges due to their inherent noise and sheer enormity in size. Without smart algorithms and infrastructures, big data problems will remain intractable, and the same is true in natural science research. The mission of data science as a research field is to develop and apply computational methods in support of and in the replacement of costly practices in handling data. In this talk, I will introduce how data science and deep learning has been used for solving various problems in natural sciences. In particular, I will present a case study of analyzing high-resolution satellite images to infer socioeconomic scales of developing countries.

  • PDF

Incremental Eigenspace Model Applied To Kernel Principal Component Analysis

  • Kim, Byung-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.345-354
    • /
    • 2003
  • An incremental kernel principal component analysis(IKPCA) is proposed for the nonlinear feature extraction from the data. The problem of batch kernel principal component analysis(KPCA) is that the computation becomes prohibitive when the data set is large. Another problem is that, in order to update the eigenvectors with another data, the whole eigenvectors should be recomputed. IKPCA overcomes this problem by incrementally updating the eigenspace model. IKPCA is more efficient in memory requirement than a batch KPCA and can be easily improved by re-learning the data. In our experiments we show that IKPCA is comparable in performance to a batch KPCA for the classification problem on nonlinear data set.

  • PDF

A layered-wise data augmenting algorithm for small sampling data (적은 양의 데이터에 적용 가능한 계층별 데이터 증강 알고리즘)

  • Cho, Hee-chan;Moon, Jong-sub
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.65-72
    • /
    • 2019
  • Data augmentation is a method that increases the amount of data through various algorithms based on a small amount of sample data. When machine learning and deep learning techniques are used to solve real-world problems, there is often a lack of data sets. The lack of data is at greater risk of underfitting and overfitting, in addition to the poor reflection of the characteristics of the set of data when learning a model. Thus, in this paper, through the layer-wise data augmenting method at each layer of deep neural network, the proposed method produces augmented data that is substantially meaningful and shows that the method presented by the paper through experimentation is effective in the learning of the model by measuring whether the method presented by the paper improves classification accuracy.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

PREDICTION OF RESIDUAL STRESS FOR DISSIMILAR METALS WELDING AT NUCLEAR POWER PLANTS USING FUZZY NEURAL NETWORK MODELS

  • Na, Man-Gyun;Kim, Jin-Weon;Lim, Dong-Hyuk
    • Nuclear Engineering and Technology
    • /
    • v.39 no.4
    • /
    • pp.337-348
    • /
    • 2007
  • A fuzzy neural network model is presented to predict residual stress for dissimilar metal welding under various welding conditions. The fuzzy neural network model, which consists of a fuzzy inference system and a neuronal training system, is optimized by a hybrid learning method that combines a genetic algorithm to optimize the membership function parameters and a least squares method to solve the consequent parameters. The data of finite element analysis are divided into four data groups, which are split according to two end-section constraints and two prediction paths. Four fuzzy neural network models were therefore applied to the numerical data obtained from the finite element analysis for the two end-section constraints and the two prediction paths. The fuzzy neural network models were trained with the aid of a data set prepared for training (training data), optimized by means of an optimization data set and verified by means of a test data set that was different (independent) from the training data and the optimization data. The accuracy of fuzzy neural network models is known to be sufficiently accurate for use in an integrity evaluation by predicting the residual stress of dissimilar metal welding zones.

Adaptive Recommendation System for Health Screening based on Machine Learning

  • Kim, Namyun;Kim, Sung-Dong
    • International journal of advanced smart convergence
    • /
    • v.9 no.2
    • /
    • pp.1-7
    • /
    • 2020
  • As the demand for health screening increases, there is a need for efficient design of screening items. We build machine learning models for health screening and recommend screening items to provide personalized health care service. When offline, a synthetic data set is generated based on guidelines and clinical results from institutions, and a machine learning model for each screening item is generated. When online, the recommendation server provides a recommendation list of screening items in real time using the customer's health condition and machine learning models. As a result of the performance analysis, the accuracy of the learning model was close to 100%, and server response time was less than 1 second to serve 1,000 users simultaneously. This paper provides an adaptive and automatic recommendation in response to changes in the new screening environment.

Machine Learning Based Neighbor Path Selection Model in a Communication Network

  • Lee, Yong-Jin
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.56-61
    • /
    • 2021
  • Neighbor path selection is to pre-select alternate routes in case geographically correlated failures occur simultaneously on the communication network. Conventional heuristic-based algorithms no longer improve solutions because they cannot sufficiently utilize historical failure information. We present a novel solution model for neighbor path selection by using machine learning technique. Our proposed machine learning neighbor path selection (ML-NPS) model is composed of five modules- random graph generation, data set creation, machine learning modeling, neighbor path prediction, and path information acquisition. It is implemented by Python with Keras on Tensorflow and executed on the tiny computer, Raspberry PI 4B. Performance evaluations via numerical simulation show that the neighbor path communication success probability of our model is better than that of the conventional heuristic by 26% on the average.

Development of Auto Tracking System for Baseball Pitching (투구된 공의 실시간 위치 자동추적 시스템 개발)

  • Lee, Ki-Chung;Bae, Sung-Jae;Shin, In-Sik
    • Korean Journal of Applied Biomechanics
    • /
    • v.17 no.1
    • /
    • pp.81-90
    • /
    • 2007
  • The effort identifying positioning information of the moving object in real time has been a issue not only in sport biomechanics but also other academic areas. In order to solve this issue, this study tried to track the movement of a pitched ball that might provide an easier prediction because of a clear focus and simple movement of the object. Machine learning has been leading the research of extracting information from continuous images such as object tracking. Though the rule-based methods in artificial intelligence prevailed for decades, it has evolved into the methods of statistical approach that finds the maximum a posterior location in the image. The development of machine learning, accompanied by the development of recording technology and computational power of computer, made it possible to extract the trajectory of pitched baseball from recorded images. We present a method of baseball tracking, based on object tracking methods in machine learning. We introduce three state-of-the-art researches regarding the object tracking and show how we can combine these researches to yield a novel engine that finds trajectory from continuous pitching images. The first research is about mean shift method which finds the mode of a supposed continuous distribution from a set of data. The second research is about the research that explains how we can find the mode and object region effectively when we are given the previous image's location of object and the region. The third is about the research of representing data into features that we can deal with. From those features, we can establish a distribution to generate a set of data for mean shift. In this paper, we combine three works to track baseball's location in the continuous image frames. From the information of locations from two sets of images, we can reconstruct the real 3-D trajectory of pitched ball. We show how this works in real pitching images.

Feature Selection for Creative People Based on Big 5 Personality traits and Machine Learning Algorithms (Big 5 성격 요소와 머신 러닝 알고리즘을 통한 창의적인 사람들의 특징 연구)

  • Kim, Yong-Jun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.97-102
    • /
    • 2019
  • There are many difficulties to define because there is no systematic classification and analysis method using accurate criteria or numerical values for creative people. In order to solve this problem, this study attempts to analyze how to distinguish creative people and what kind of personality they have when distinguishing creative people. In this study, I first survey the Big 5 personality trait, classify and analyze the data set using the data mining tool WEKA, and then analyze the data set related to the creativity The goal is to analyze the features using various machine learning techniques. I use seven feature selection algorithms, select feature groups classified by feature selection algorithms, apply them to machine learning algorithms to find out the accuracy, and derive the results.

Ensemble learning of Regional Experts (지역 전문가의 앙상블 학습)

  • Lee, Byung-Woo;Yang, Ji-Hoon;Kim, Seon-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.2
    • /
    • pp.135-139
    • /
    • 2009
  • We present a new ensemble learning method that employs the set of region experts, each of which learns to handle a subset of the training data. We split the training data and generate experts for different regions in the feature space. When classifying a data, we apply a weighted voting among the experts that include the data in their region. We used ten datasets to compare the performance of our new ensemble method with that of single classifiers as well as other ensemble methods such as Bagging and Adaboost. We used SMO, Naive Bayes and C4.5 as base learning algorithms. As a result, we found that the performance of our method is comparable to that of Adaboost and Bagging when the base learner is C4.5. In the remaining cases, our method outperformed the benchmark methods.