• 제목/요약/키워드: Training data

검색결과 7,231건 처리시간 0.029초

인공신경망 이론을 이용한 위성영상의 카테고리분류 (Multi-temporal Remote-Sensing Imag e ClassificationUsing Artificial Neural Networks)

  • 강문성;박승우;임재천
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 2001년도 학술발표회 발표논문집
    • /
    • pp.59-64
    • /
    • 2001
  • The objectives of the thesis are to propose a pattern classification method for remote sensing data using artificial neural network. First, we apply the error back propagation algorithm to classify the remote sensing data. In this case, the classification performance depends on a training data set. Using the training data set and the error back propagation algorithm, a layered neural network is trained such that the training pattern are classified with a specified accuracy. After training the neural network, some pixels are deleted from the original training data set if they are incorrectly classified and a new training data set is built up. Once training is complete, a testing data set is classified by using the trained neural network. The classification results of Landsat TM data show that this approach produces excellent results which are more realistic and noiseless compared with a conventional Bayesian method.

  • PDF

비분류표시 데이타를 이용하는 분류 기반 Co-training 방법 (A Co-training Method based on Classification Using Unlabeled Data)

  • 윤혜성;이상호;박승수;용환승;김주한
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권8호
    • /
    • pp.991-998
    • /
    • 2004
  • 생물 정보학 등 많은 응용 분야에서 데이타 분석을 할 때는 적은 수의 분류표시된 데이터 (labeled data)와 많은 수의 비분류표시된 데이타(unlabeled data)가 있을 수 있다 분류표시된 자료는 사람의 노력이 요구되기 때문에 얻기가 어렵고 비용이 많이 들지만, 비분류표시된 자료는 별 어려움 없이 쉽게 얻을 수 있다. 이때 비분류표시된 자료를 이용하여 자료를 분류하고 분석하는데 널리 이용되고 있는 방법이 co-training 알고리즘이다. 이 방법은 적은 수의 분류표시된 자료에서 두 가지 뷰(view)로 각 분류자를 학습한다. 그리고 각 분류자는 분석하고자 하는 모든 비분류표시된 자료에서 가장 만족할만한 예측자들을 만들어 나간다. 이렇게 훈련 데이타 셋에서 실험을 여러 번 반복적으로 하게 되면 각 뷰에서 새로운 분류자가 학습되어 분류표시된 자료의 수가 증가한다. 본 논문에서는 비분류표시된 데이타를 이용하여 새로운 co-training 방법을 제시한다. 이 방법은 두 가지 분류자와 WebKB 및 BIND XML의 2가지 실험 데이타를 가지고 평가하였다. 실험 결과로서, 이 논문에서 제안한 co-training 방법이 분류표시된 자료의 수가 매우 적을 때 분류정확성을 효과적으로 향상시킬 수 있음을 보였다.

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • 제1권1호
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Document Image Binarization by GAN with Unpaired Data Training

  • Dang, Quang-Vinh;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • 제16권2호
    • /
    • pp.8-18
    • /
    • 2020
  • Data is critical in deep learning but the scarcity of data often occurs in research, especially in the preparation of the paired training data. In this paper, document image binarization with unpaired data is studied by introducing adversarial learning, excluding the need for supervised or labeled datasets. However, the simple extension of the previous unpaired training to binarization inevitably leads to poor performance compared to paired data training. Thus, a new deep learning approach is proposed by introducing a multi-diversity of higher quality generated images. In this paper, a two-stage model is proposed that comprises the generative adversarial network (GAN) followed by the U-net network. In the first stage, the GAN uses the unpaired image data to create paired image data. With the second stage, the generated paired image data are passed through the U-net network for binarization. Thus, the trained U-net becomes the binarization model during the testing. The proposed model has been evaluated over the publicly available DIBCO dataset and it outperforms other techniques on unpaired training data. The paper shows the potential of using unpaired data for binarization, for the first time in the literature, which can be further improved to replace paired data training for binarization in the future.

Generating and Validating Synthetic Training Data for Predicting Bankruptcy of Individual Businesses

  • Hong, Dong-Suk;Baik, Cheol
    • Journal of information and communication convergence engineering
    • /
    • 제19권4호
    • /
    • pp.228-233
    • /
    • 2021
  • In this study, we analyze the credit information (loan, delinquency information, etc.) of individual business owners to generate voluminous training data to establish a bankruptcy prediction model through a partial synthetic training technique. Furthermore, we evaluate the prediction performance of the newly generated data compared to the actual data. When using conditional tabular generative adversarial networks (CTGAN)-based training data generated by the experimental results (a logistic regression task), the recall is improved by 1.75 times compared to that obtained using the actual data. The probability that both the actual and generated data are sampled over an identical distribution is verified to be much higher than 80%. Providing artificial intelligence training data through data synthesis in the fields of credit rating and default risk prediction of individual businesses, which have not been relatively active in research, promotes further in-depth research efforts focused on utilizing such methods.

The Effectiveness of the Training Program at HCL

  • Kumari, Neeraj
    • Asian Journal of Business Environment
    • /
    • 제5권3호
    • /
    • pp.23-28
    • /
    • 2015
  • Purpose - The aim of this study is to evaluate the effectiveness of a corporate training program. The case study of HCL Technologies was used to investigate how training programs improve the performance of employees on the job, as well as to identify unnecessary aspects of the training for the purpose of eliminating these from future training programs. Research design, data, and methodology - An exploratory research design was used to conduct the study. The research sample size included 50 HCL employees. The sampling technique for the data collection was convenience sampling. Results - Training is a crucial process in an organization and thus needs to be well designed. Specifically, the training programs should provide adequate knowledge to all employees, ensure correct methods are used for the selection of trainees, and avoid any perception of biasness. Conclusions - Employees were not fully satisfied by the separation of the training program into two parts, on the job and off the job training, but if sufficient data is provided to employees in advance, this could help them during the training process.

미리 순서가 매겨진 학습 데이타를 이용한 효과적인 증가학습 (Efficient Incremental Learning using the Preordered Training Data)

  • 이선영;방승양
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제27권2호
    • /
    • pp.97-107
    • /
    • 2000
  • 증가학습은 점진적으로 학습 데이타를 늘려가며 신경망을 학습시킴으로써 일반적으로 학습시간을 단축시킬 뿐만 아니라 신경망의 일반화 성능을 향상시킨다. 그러나, 기존의 증가학습은 학습 데이타를 선정하는 과정에서 데이타의 중요도를 반복적으로 평가한다. 본 논문에서는 분류 문제의 경우 학습이 시작되기 전에 데이타의 중요도를 한 번만 평가한다. 제안된 방법에서는 분류 문제의 경우 클래스 경계에 가까운 데이타일수록 그 데이타의 중요도가 높다고 보고 이러한 데이타를 선택하는 방법을 제시한다. 두가지 합성 데이타와 실세계 데이타의 실험을 통해 제안된 방법이 기존의 방법보다 학습 시간을 단축시키며 일반화 성능을 향상시킴을 보인다.

  • PDF

Improving the Subject Independent Classification of Implicit Intention By Generating Additional Training Data with PCA and ICA

  • Oh, Sang-Hoon
    • International Journal of Contents
    • /
    • 제14권4호
    • /
    • pp.24-29
    • /
    • 2018
  • EEG-based brain-computer interfaces has focused on explicitly expressed intentions to assist physically impaired patients. For EEG-based-computer interfaces to function effectively, it should be able to understand users' implicit information. Since it is hard to gather EEG signals of human brains, we do not have enough training data which are essential for proper classification performance of implicit intention. In this paper, we improve the subject independent classification of implicit intention through the generation of additional training data. In the first stage, we perform the PCA (principal component analysis) of training data in a bid to remove redundant components in the components within the input data. After the dimension reduction by PCA, we train ICA (independent component analysis) network whose outputs are statistically independent. We can get additional training data by adding Gaussian noises to ICA outputs and projecting them to input data domain. Through simulations with EEG data provided by CNSL, KAIST, we improve the classification performance from 65.05% to 66.69% with Gamma components. The proposed sample generation method can be applied to any machine learning problem with fewer samples.

준 지도학습 알고리즘을 이용한 뇌파 감정 분석을 위한 학습데이터 선택 방법에 관한 연구 (A Study on Training Data Selection Method for EEG Emotion Analysis using Semi-supervised Learning Algorithm)

  • 윤종섭;김진헌
    • 전기전자학회논문지
    • /
    • 제22권3호
    • /
    • pp.816-821
    • /
    • 2018
  • 최근 감정 분석 및 질병 진단을 위한 뇌파 연구 분야에서 인공 신경망을 기반으로 한 기계학습 알고리즘이 분류기로 널리 사용되기 시작했다. 뇌파 데이터 분류를 위해 기계학습 모델을 사용하는 경우 유사한 특성을 가지는 데이터만으로 학습데이터가 구성되면 다른 그룹의 데이터에 적용했을 때 분류 성능이 떨어질 수 있다. 본 논문에서는 이러한 문제점을 개선하기 위해 준 지도학습 알고리즘을 사용해 여러 그룹의 데이터를 선택하여 학습데이터 세트를 구성하는 방법을 제안한다. 이후 제안하는 방법을 사용하여 구성한 학습데이터 세트와 유사한 특성을 가지는 데이터로 구성된 학습데이터 세트로 모델을 학습하여 두 모델의 성능을 비교하였다.