• Title/Summary/Keyword: data for training

Search Result 6,695, Processing Time 0.038 seconds

Dynamically weighted loss based domain adversarial training for children's speech recognition (어린이 음성인식을 위한 동적 가중 손실 기반 도메인 적대적 훈련)

  • Seunghee, Ma
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.647-654
    • /
    • 2022
  • Although the fields in which is utilized children's speech recognition is on the rise, the lack of quality data is an obstacle to improving children's speech recognition performance. This paper proposes a new method for improving children's speech recognition performance by additionally using adult speech data. The proposed method is a transformer based domain adversarial training using dynamically weighted loss to effectively address the data imbalance gap between age that grows as the amount of adult training data increases. Specifically, the degree of class imbalance in the mini-batch during training was quantified, and the loss function was defined and used so that the smaller the data, the greater the weight. Experiments validate the utility of proposed domain adversarial training following asymmetry between adults and children training data. Experiments show that the proposed method has higher children's speech recognition performance than traditional domain adversarial training method under all conditions in which asymmetry between age occurs in the training data.

Automatic Classification Method for Time-Series Image Data using Reference Map (Reference Map을 이용한 시계열 image data의 자동분류법)

  • Hong, Sun-Pyo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.58-65
    • /
    • 1997
  • A new automatic classification method with high and stable accuracy for time-series image data is presented in this paper. This method is based on prior condition that a classified map of the target area already exists, or at least one of the time-series image data had been classified. The classified map is used as a reference map to specify training areas of classification categories. The new automatic classification method consists of five steps, i.e., extraction of training data using reference map, detection of changed pixels based upon the homogeneity of training data, clustering of changed pixels, reconstruction of training data, and classification as like maximum likelihood classifier. In order to evaluate the performance of this method qualitatively, four time-series Landsat TM image data were classified by using this method and a conventional method which needs a skilled operator. As a results, we could get classified maps with high reliability and fast throughput, without a skilled operator.

  • PDF

Tri-training algorithm based on cross entropy and K-nearest neighbors for network intrusion detection

  • Zhao, Jia;Li, Song;Wu, Runxiu;Zhang, Yiying;Zhang, Bo;Han, Longzhe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3889-3903
    • /
    • 2022
  • To address the problem of low detection accuracy due to training noise caused by mislabeling when Tri-training for network intrusion detection (NID), we propose a Tri-training algorithm based on cross entropy and K-nearest neighbors (TCK) for network intrusion detection. The proposed algorithm uses cross-entropy to replace the classification error rate to better identify the difference between the practical and predicted distributions of the model and reduce the prediction bias of mislabeled data to unlabeled data; K-nearest neighbors are used to remove the mislabeled data and reduce the number of mislabeled data. In order to verify the effectiveness of the algorithm proposed in this paper, experiments were conducted on 12 UCI datasets and NSL-KDD network intrusion datasets, and four indexes including accuracy, recall, F-measure and precision were used for comparison. The experimental results revealed that the TCK has superior performance than the conventional Tri-training algorithms and the Tri-training algorithms using only cross-entropy or K-nearest neighbor strategy.

Efficient Incremental Learning using the Preordered Training Data (미리 순서가 매겨진 학습 데이타를 이용한 효과적인 증가학습)

  • Lee, Sun-Young;Bang, Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.2
    • /
    • pp.97-107
    • /
    • 2000
  • Incremental learning generally reduces training time and increases the generalization of a neural network by selecting training data incrementally during the training. However, the existing methods of incremental learning repeatedly evaluate the importance of training data every time they select additional data. In this paper, an incremental learning algorithm is proposed for pattern classification problems. It evaluates the importance of each piece of data only once before starting the training. The importance of the data depends on how close they are to the decision boundary. The current paper presents an algorithm which orders the data according to their distance to the decision boundary by using clustering. Experimental results of two artificial and real world classification problems show that this proposed incremental learning method significantly reduces the size of the training set without decreasing generalization performance.

  • PDF

The Development of a Social Skill Training Program for ADHD Children and It's Effect (ADHD 아동을 위한 사회기술훈련 프로그램의 개발과 효과)

  • Lee, Hye-Sug
    • The Korean Journal of Elementary Counseling
    • /
    • v.6 no.1
    • /
    • pp.171-191
    • /
    • 2007
  • The purpose of this study is to develop social skill training in order to reduce problematic behaviors and improve peer relations for elementary school students who have ADHD(Attention Deficit Hyperactivity Disorder) and then verify its effectiveness. The problems for this study are as follows: Firstly, is the social skill training for students with ADHD effective in enhancing their self-esteem? Secondly, is the social skill training for students with ADHD effective in reducing their carelessness, hyperactivity and impulsive character? Thirdly, is the social skill training for students with ADHD effective in improving peer relations? Subjects were six 5th grade children who were selected by the ADHD-SC4 at P elementary school in Pyeongtaek. The social skill training consisted of 10 sessions which included forming friendship, recognizing, making friends, solving problems, reeducation and evaluation. Qualitative data were collected through self-esteem inventory, peer-relation test, self-reported scales for children and Conners' Teacher rating score for ADHD children. The collected data were analysed with t-test. Qualitative data were collected though teacher's interview and observation an the children. The results of the study were follows: First, the social skill training did not give a significant effect in enhancing the self-esteem of the children with ADHD. Second, the social skill training had a positive effect in reducing in attentiveness, hyperactivity and impulsive behavior of the children with ADHD. Third, the social skill training did not give a significant effect in improving the peer relations of the children with ADHD. Fourth the qualitative data showed that the social skill training had positive effect in enhancing over all classroom behavior.

  • PDF

Development of a Steel Plate Surface Defect Detection System Based on Small Data Deep Learning (소량 데이터 딥러닝 기반 강판 표면 결함 검출 시스템 개발)

  • Gaybulayev, Abdulaziz;Lee, Na-Hyeon;Lee, Ki-Hwan;Kim, Tae-Hyong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.3
    • /
    • pp.129-138
    • /
    • 2022
  • Collecting and labeling sufficient training data, which is essential to deep learning-based visual inspection, is difficult for manufacturers to perform because it is very expensive. This paper presents a steel plate surface defect detection system with industrial-grade detection performance by training a small amount of steel plate surface images consisting of labeled and non-labeled data. To overcome the problem of lack of training data, we propose two data augmentation techniques: program-based augmentation, which generates defect images in a geometric way, and generative model-based augmentation, which learns the distribution of labeled data. We also propose a 4-step semi-supervised learning using pseudo labels and consistency training with fixed-size augmentation in order to utilize unlabeled data for training. The proposed technique obtained about 99% defect detection performance for four defect types by using 100 real images including labeled and unlabeled data.

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

An Efficient Detection Method for Rail Surface Defect using Limited Label Data (한정된 레이블 데이터를 이용한 효율적인 철도 표면 결함 감지 방법)

  • Seokmin Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.83-88
    • /
    • 2024
  • In this research, we propose a Semi-Supervised learning based railroad surface defect detection method. The Resnet50 model, pretrained on ImageNet, was employed for the training. Data without labels are randomly selected, and then labeled to train the ResNet50 model. The trained model is used to predict the results of the remaining unlabeled training data. The predicted values exceeding a certain threshold are selected, sorted in descending order, and added to the training data. Pseudo-labeling is performed based on the class with the highest probability during this process. An experiment was conducted to assess the overall class classification performance based on the initial number of labeled data. The results showed an accuracy of 98% at best with less than 10% labeled training data compared to the overall training data.

Design and Implementation of Cyber Warfare Training Data Set Generation Method based on Traffic Distribution Plan (트래픽 유통계획 기반 사이버전 훈련데이터셋 생성방법 설계 및 구현)

  • Kim, Yong Hyun;Ahn, Myung Kil
    • Convergence Security Journal
    • /
    • v.20 no.4
    • /
    • pp.71-80
    • /
    • 2020
  • In order to provide realistic traffic to the cyber warfare training system, it is necessary to prepare a traffic distribution plan in advance and to create a training data set using normal/threat data sets. This paper presents the design and implementation results of a method for creating a traffic distribution plan and a training data set to provide background traffic like a real environment to a cyber warfare training system. We propose a method of a traffic distribution plan by using the network topology of the training environment to distribute traffic and the traffic attribute information collected in real and simulated environments. We propose a method of generating a training data set according to a traffic distribution plan using a unit traffic and a mixed traffic method using the ratio of the protocol. Using the implemented tool, a traffic distribution plan was created, and the training data set creation result according to the distribution plan was confirmed.

A Study on Sales Training of Clothing Companies (의류 판매원 교육실태에 관한 연구)

  • 김미숙;김보경
    • The Research Journal of the Costume Culture
    • /
    • v.7 no.4
    • /
    • pp.155-167
    • /
    • 1999
  • The present study investigated various sales training programs used by apparel companies and compared each other in order to provide an important information for developing effective training programs for professional salesperson. Sixty eight companies were used and grouped into four categories based on brand characteristics : domestic national brand(DNB), casual brand(CB), foreign brand(FB) and domestic designer brand(DDB). Data were collected from the managers in charge or training salesperson by both questionnaires and personal and telephone interviews. Data were collected during July in 1998, and analyzed by using ANOVA, Duncan\`s multiple range test, and Chi-square test. Since the sample size was small, Yates\` correction formula was used to maximize statistical validity in non-parametric procedure of Chi-square test. The main purpose of sales training indicated by the companies were to satisfy customers and to maximize the profit. Significant differences were found among the groups in the importance level of training contents such as knowledge, and customer relation, training methods, place, and duration/frequency of training at training center.

  • PDF