• Title/Summary/Keyword: Supervised Data

Search Result 661, Processing Time 0.036 seconds

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

Abnormal Data Augmentation Method Using Perturbation Based on Hypersphere for Semi-Supervised Anomaly Detection (준 지도 이상 탐지 기법의 성능 향상을 위한 섭동을 활용한 초구 기반 비정상 데이터 증강 기법)

  • Jung, Byeonggil;Kwon, Junhyung;Min, Dongjun;Lee, Sangkyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.4
    • /
    • pp.647-660
    • /
    • 2022
  • Recent works demonstrate that the semi-supervised anomaly detection method functions quite well in the environment with normal data and some anomalous data. However, abnormal data shortages can occur in an environment where it is difficult to reserve anomalous data, such as an unknown attack in the cyber security fields. In this paper, we propose ADA-PH(Abnormal Data Augmentation Method using Perturbation based on Hypersphere), a novel anomalous data augmentation method that is applicable in an environment where abnormal data is insufficient to secure the performance of the semi-supervised anomaly detection method. ADA-PH generates abnormal data by perturbing samples located relatively far from the center of the hypersphere. With the network intrusion detection datasets where abnormal data is rare, ADA-PH shows 23.63% higher AUC performance than anomaly detection without data augmentation and even performs better than the other augmentation methods. Also, we further conduct quantitative and qualitative analysis on whether generated abnormal data is anomalous.

Supervised Classification Systems for High Resolution Satellite Images (고해상도 위성영상을 위한 감독분류 시스템)

  • 전영준;김진일
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.301-310
    • /
    • 2003
  • In this paper, we design and Implement the supervised classification systems for high resolution satellite images. The systems support various interfaces and statistical data of training samples so that we can select the m()st effective training data. In addition, the efficient extension of new classification algorithms and satellite image formats are applied easily through the modularized systems. The classifiers are considered the characteristics of spectral bands from the selected training data. They provide various supervised classification algorithms which include Parallelepiped, Minimum distance, Mahalanobis distance, Maximum likelihood and Fuzzy theory. We used IKONOS images for the input and verified the systems for the classification of high resolution satellite images.

A supervised-learning-based spatial performance prediction framework for heterogeneous communication networks

  • Mukherjee, Shubhabrata;Choi, Taesang;Islam, Md Tajul;Choi, Baek-Young;Beard, Cory;Won, Seuck Ho;Song, Sejun
    • ETRI Journal
    • /
    • v.42 no.5
    • /
    • pp.686-699
    • /
    • 2020
  • In this paper, we propose a supervised-learning-based spatial performance prediction (SLPP) framework for next-generation heterogeneous communication networks (HCNs). Adaptive asset placement, dynamic resource allocation, and load balancing are critical network functions in an HCN to ensure seamless network management and enhance service quality. Although many existing systems use measurement data to react to network performance changes, it is highly beneficial to perform accurate performance prediction for different systems to support various network functions. Recent advancements in complex statistical algorithms and computational efficiency have made machine-learning ubiquitous for accurate data-based prediction. A robust network performance prediction framework for optimizing performance and resource utilization through a linear discriminant analysis-based prediction approach has been proposed in this paper. Comparison results with different machine-learning techniques on real-world data demonstrate that SLPP provides superior accuracy and computational efficiency for both stationary and mobile user conditions.

EER-ASSL: Combining Rollback Learning and Deep Learning for Rapid Adaptive Object Detection

  • Ahmed, Minhaz Uddin;Kim, Yeong Hyeon;Rhee, Phill Kyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4776-4794
    • /
    • 2020
  • We propose a rapid adaptive learning framework for streaming object detection, called EER-ASSL. The method combines the expected error reduction (EER) dependent rollback learning and the active semi-supervised learning (ASSL) for a rapid adaptive CNN detector. Most CNN object detectors are built on the assumption of static data distribution. However, images are often noisy and biased, and the data distribution is imbalanced in a real world environment. The proposed method consists of collaborative sampling and EER-ASSL. The EER-ASSL utilizes the active learning (AL) and rollback based semi-supervised learning (SSL). The AL allows us to select more informative and representative samples measuring uncertainty and diversity. The SSL divides the selected streaming image samples into the bins and each bin repeatedly transfers the discriminative knowledge of the EER and CNN models to the next bin until convergence and incorporation with the EER rollback learning algorithm is achieved. The EER models provide a rapid short-term myopic adaptation and the CNN models an incremental long-term performance improvement. EER-ASSL can overcome noisy and biased labels in varying data distribution. Extensive experiments shows that EER-ASSL obtained 70.9 mAP compared to state-of-the-art technology such as Faster RCNN, SSD300, and YOLOv2.

Data Security on Cloud by Cryptographic Methods Using Machine Learning Techniques

  • Gadde, Swetha;Amutharaj, J.;Usha, S.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.342-347
    • /
    • 2022
  • On Cloud, the important data of the user that is protected on remote servers can be accessed via internet. Due to rapid shift in technology nowadays, there is a swift increase in the confidential and pivotal data. This comes up with the requirement of data security of the user's data. Data is of different type and each need discrete degree of conservation. The idea of data security data science permits building the computing procedure more applicable and bright as compared to conventional ones in the estate of data security. Our focus with this paper is to enhance the safety of data on the cloud and also to obliterate the problems associated with the data security. In our suggested plan, some basic solutions of security like cryptographic techniques and authentication are allotted in cloud computing world. This paper put your heads together about how machine learning techniques is used in data security in both offensive and defensive ventures, including analysis on cyber-attacks focused at machine learning techniques. The machine learning technique is based on the Supervised, UnSupervised, Semi-Supervised and Reinforcement Learning. Although numerous research has been done on this topic but in reference with the future scope a lot more investigation is required to be carried out in this field to determine how the data can be secured more firmly on cloud in respect with the Machine Learning Techniques and cryptographic methods.

ACCOUNTING FOR IMPORTANCE OF VARIABLES IN MUL TI-SENSOR DATA FUSION USING RANDOM FORESTS

  • Park No-Wook;Chi Kwang-Hoon
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.283-285
    • /
    • 2005
  • To account for the importance of variable in multi-sensor data fusion, random forests are applied to supervised land-cover classification. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. Its distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Supervised classification with a multi-sensor remote sensing data set including optical and polarimetric SAR data was carried out to illustrate the applicability of random forests. From the experimental result, the random forests approach could extract important variables or bands for land-cover discrimination and showed good performance, as compared with other non-parametric data fusion algorithms.

  • PDF

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

  • Park No-Wook;Chi kwang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.3
    • /
    • pp.211-219
    • /
    • 2006
  • A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.

Anomaly-based Alzheimer's disease detection using entropy-based probability Positron Emission Tomography images

  • Husnu Baris Baydargil;Jangsik Park;Ibrahim Furkan Ince
    • ETRI Journal
    • /
    • v.46 no.3
    • /
    • pp.513-525
    • /
    • 2024
  • Deep neural networks trained on labeled medical data face major challenges owing to the economic costs of data acquisition through expensive medical imaging devices, expert labor for data annotation, and large datasets to achieve optimal model performance. The heterogeneity of diseases, such as Alzheimer's disease, further complicates deep learning because the test cases may substantially differ from the training data, possibly increasing the rate of false positives. We propose a reconstruction-based self-supervised anomaly detection model to overcome these challenges. It has a dual-subnetwork encoder that enhances feature encoding augmented by skip connections to the decoder for improving the gradient flow. The novel encoder captures local and global features to improve image reconstruction. In addition, we introduce an entropy-based image conversion method. Extensive evaluations show that the proposed model outperforms benchmark models in anomaly detection and classification using an encoder. The supervised and unsupervised models show improved performances when trained with data preprocessed using the proposed image conversion method.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.