• Title/Summary/Keyword: One-hot Encoding

Search Result 21, Processing Time 0.024 seconds

Performance Comparison According to Image Generation Method in NIDS (Network Intrusion Detection System) using CNN

  • Sang Hyun, Kim
    • International journal of advanced smart convergence
    • /
    • v.12 no.2
    • /
    • pp.67-75
    • /
    • 2023
  • Recently, many studies have been conducted on ways to utilize AI technology in NIDS (Network Intrusion Detection System). In particular, CNN-based NIDS generally shows excellent performance. CNN is basically a method of using correlation between pixels existing in an image. Therefore, the method of generating an image is very important in CNN. In this paper, the performance comparison of CNN-based NIDS according to the image generation method was performed. The image generation methods used in the experiment are a direct conversion method and a one-hot encoding based method. As a result of the experiment, the performance of NIDS was different depending on the image generation method. In particular, it was confirmed that the method combining the direct conversion method and the one-hot encoding based method proposed in this paper showed the best performance.

Creating Songs Using Note Embedding and Bar Embedding and Quantitatively Evaluating Methods (음표 임베딩과 마디 임베딩을 이용한 곡의 생성 및 정량적 평가 방법)

  • Lee, Young-Bae;Jung, Sung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.483-490
    • /
    • 2021
  • In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.

Deep Learning Based Short-Term Electric Load Forecasting Models using One-Hot Encoding (원-핫 인코딩을 이용한 딥러닝 단기 전력수요 예측모델)

  • Kim, Kwang Ho;Chang, Byunghoon;Choi, Hwang Kyu
    • Journal of IKEEE
    • /
    • v.23 no.3
    • /
    • pp.852-857
    • /
    • 2019
  • In order to manage the demand resources of project participants and to provide appropriate strategies in the virtual power plant's power trading platform for consumers or operators who want to participate in the distributed resource collective trading market, it is very important to forecast the next day's demand of individual participants and the overall system's electricity demand. This paper developed a power demand forecasting model for the next day. For the model, we used LSTM algorithm of deep learning technique in consideration of time series characteristics of power demand forecasting data, and new scheme is applied by applying one-hot encoding method to input/output values such as power demand. In the performance evaluation for comparing the general DNN with our LSTM forecasting model, both model showed 4.50 and 1.89 of root mean square error, respectively, and our LSTM model showed high prediction accuracy.

Automatic Augmentation Technique of an Autoencoder-based Numerical Training Data (오토인코더 기반 수치형 학습데이터의 자동 증강 기법)

  • Jeong, Ju-Eun;Kim, Han-Joon;Chun, Jong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.75-86
    • /
    • 2022
  • This study aims to solve the problem of class imbalance in numerical data by using a deep learning-based Variational AutoEncoder and to improve the performance of the learning model by augmenting the learning data. We propose 'D-VAE' to artificially increase the number of records for a given table data. The main features of the proposed technique go through discretization and feature selection in the preprocessing process to optimize the data. In the discretization process, K-means are applied and grouped, and then converted into one-hot vectors by one-hot encoding technique. Subsequently, for memory efficiency, sample data are generated with Variational AutoEncoder using only features that help predict with RFECV among feature selection techniques. To verify the performance of the proposed model, we demonstrate its validity by conducting experiments by data augmentation ratio.

Deep Learning-Based Model for Classification of Medical Record Types in EEG Report (EEG Report의 의무기록 유형 분류를 위한 딥러닝 기반 모델)

  • Oh, Kyoungsu;Kang, Min;Kang, Seok-hwan;Lee, Young-ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.5
    • /
    • pp.203-210
    • /
    • 2022
  • As more and more research and companies use health care data, efforts are being made to vitalize health care data worldwide. However, the system and format used by each institution is different. Therefore, this research established a basic model to classify text data onto multiple institutions according to the type of the future by establishing a basic model to classify the types of medical records of the EEG Report. For EEG Report classification, four deep learning-based algorithms were compared. As a result of the experiment, the ANN model trained by vectorizing with One-Hot Encoding showed the highest performance with an accuracy of 71%.

Could Decimal-binary Vector be a Representative of DNA Sequence for Classification?

  • Sanjaya, Prima;Kang, Dae-Ki
    • International journal of advanced smart convergence
    • /
    • v.5 no.3
    • /
    • pp.8-15
    • /
    • 2016
  • In recent years, one of deep learning models called Deep Belief Network (DBN) which formed by stacking restricted Boltzman machine in a greedy fashion has beed widely used for classification and recognition. With an ability to extracting features of high-level abstraction and deal with higher dimensional data structure, this model has ouperformed outstanding result on image and speech recognition. In this research, we assess the applicability of deep learning in dna classification level. Since the training phase of DBN is costly expensive, specially if deals with DNA sequence with thousand of variables, we introduce a new encoding method, using decimal-binary vector to represent the sequence as input to the model, thereafter compare with one-hot-vector encoding in two datasets. We evaluated our proposed model with different contrastive algorithms which achieved significant improvement for the training speed with comparable classification result. This result has shown a potential of using decimal-binary vector on DBN for DNA sequence to solve other sequence problem in bioinformatics.

Variation for Mental Health of Children of Marginalized Classes through Exercise Therapy using Deep Learning (딥러닝을 이용한 소외계층 아동의 스포츠 재활치료를 통한 정신 건강에 대한 변화)

  • Kim, Myung-Mi
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.4
    • /
    • pp.725-732
    • /
    • 2020
  • This paper uses variables following as : to follow me well(0-9), it takes a lot of time to make a decision (0-9), lethargy(0-9) during physical activity in the exercise learning program of the children in the marginalized class. This paper classifies 'gender', 'physical education classroom', and 'upper, middle and lower' of age, and observe changes in ego-resiliency and self-control through sports rehabilitation therapy to find out changes in mental health. To achieve this, the data acquired was merged and the characteristics of large and small numbers were removed using the Label encoder and One-hot encoding. Then, to evaluate the performance by applying each algorithm of MLP, SVM, Dicesion tree, RNN, and LSTM, the train and test data were divided by 75% and 25%, and then the algorithm was learned with train data and the accuracy of the algorithm was measured with the Test data. As a result of the measurement, LSTM was the most effective in sex, MLP and LSTM in physical education classroom, and SVM was the most effective in age.

Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization (단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화)

  • JoonSeo Hyeon;JaeHyuk Cho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.12-23
    • /
    • 2023
  • In the 2020s, the K-Pop market has been dominated by girl groups over boy groups and the fourth generation over the third generation. This paper presents methods and results on lyric clustering to investigate whether the generation of girl groups has started to change. We collected meta-information data for 1469 songs of 47 groups released from 2013 to 2022 and classified them into lyric information and non-lyric meta-information and quantified them respectively. The lyrics information was preprocessed by applying word-translation frequency vectorization based on previous studies and then selecting only the top vector values. Non-lyric meta-information was preprocessed and applied with One-Hot Encoding to reduce the bias of using only lyric information and show better clustering results. The clustering performance on the preprocessed data is 129%, 45% higher for Spherical K-Means' Silhouette Score and Calinski-Harabasz Score, respectively, compared to Hierarchical Clustering. This paper is expected to contribute to the study of Korean popular song development and girl group lyrics analysis and clustering.

  • PDF

Identification of a host range determinant from Ralstonia solancearum race 3

  • Yeonhwa Jeong;Lee, Seungdon;Ingyu Hwang
    • Proceedings of the Korean Society of Plant Pathology Conference
    • /
    • 2003.10a
    • /
    • pp.71.2-71
    • /
    • 2003
  • Ralstonia solancearum infects many solanaceous plants, however race 3 infects only potato and tomato weakly. To identify genes responsible for race specificity of R. solanacearum, we mobilized genomic library of LSD2029 (race 3) into LSD341 (race 1) and inoculated 1,000 transconjugants into hot pepper. One transconjugant that did not induce wilt symptom in hot pepper was isolated. We found that a cosmid clone, pRSl, conferred avirulence to LSD341. By deletion and mutational analyses of pRSl, we found the 0.9-kb PstI/Hindlll fragment carries avirulence functions. We sequenced the fragment and identified one possible open reading frame, a rsal gene, possibly encoding 110 amino acids. The rsal was preceded with a plant-inducible promoter (PIP) box, indicating that the gene might be regulated by HrpB. Interestingly, the promoter region of the rsal homolog in the strain GM11000 (race 1) did not have the PIP box. Rsal did not show any significant homologies with proteins in the database, indicating th e protein is different from the previously reported avirulence proteins. When we mutated the rsal gene by marker-exchange in LSD2029, the mutant was less virulent in potato.

  • PDF

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.