• Title/Summary/Keyword: datasets

Search Result 1,986, Processing Time 0.023 seconds

Cascaded-Hop For DeepFake Videos Detection

  • Zhang, Dengyong;Wu, Pengjie;Li, Feng;Zhu, Wenjie;Sheng, Victor S.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1671-1686
    • /
    • 2022
  • Face manipulation tools represented by Deepfake have threatened the security of people's biological identity information. Particularly, manipulation tools with deep learning technology have brought great challenges to Deepfake detection. There are many solutions for Deepfake detection based on traditional machine learning and advanced deep learning. However, those solutions of detectors almost have problems of poor performance when evaluated on different quality datasets. In this paper, for the sake of making high-quality Deepfake datasets, we provide a preprocessing method based on the image pixel matrix feature to eliminate similar images and the residual channel attention network (RCAN) to resize the scale of images. Significantly, we also describe a Deepfake detector named Cascaded-Hop which is based on the PixelHop++ system and the successive subspace learning (SSL) model. By feeding the preprocessed datasets, Cascaded-Hop achieves a good classification result on different manipulation types and multiple quality datasets. According to the experiment on FaceForensics++ and Celeb-DF, the AUC (area under curve) results of our proposed methods are comparable to the state-of-the-art models.

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

Design of Optimized Radial Basis Function Neural Networks Classifier Using EMC Sensor for Partial Discharge Pattern Recognition (부분방전 패턴인식을 위해 EMC센서를 이용한 최적화된 RBFNNs 분류기 설계)

  • Jeong, Byeong-Jin;Lee, Seung-Cheol;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.9
    • /
    • pp.1392-1401
    • /
    • 2017
  • In this study, the design methodology of pattern classification is introduced for avoiding faults through partial discharge occurring in the power facilities and local sites. In order to classify some partial discharge types according to the characteristics of each feature, the model is constructed by using the Radial Basis Function Neural Networks(RBFNNs) and Particle Swarm Optimization(PSO). In the input layer of the RBFNNs, the feature vector is searched and the dimension is reduced through Principal Component Analysis(PCA) and PSO. In the hidden layer, the fuzzy coefficients of the fuzzy clustering method(FCM) are tuned using PSO. Raw datasets for partial discharge are obtained through the Motor Insulation Monitoring System(MIMS) instrument using an Epoxy Mica Coupling(EMC) sensor. The preprocessed datasets for partial discharge are acquired through the Phase Resolved Partial Discharge Analysis(PRPDA) preprocessing algorithm to obtain partial discharge types such as void, corona, surface, and slot discharges. Also, when the amplitude size is considered as two types of both the maximum value and the average value in the process for extracting the preprocessed datasets, two different kinds of feature datasets are produced. In this study, the classification ratio between the proposed RBFNNs model and other classifiers is shown by using the two different kinds of feature datasets, and also we demonstrate the proposed model shows superiority from the viewpoint of classification performance.

Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach (2 단계 접근법을 통한 통합 마이크로어레이 데이타의 분류기 생성)

  • Yoon, Young-Mi;Lee, Jong-Chan;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.46-58
    • /
    • 2007
  • Since microarray data acquire tens of thousands of gene expression values simultaneously, they could be very useful in identifying the phenotypes of diseases. However, the results of analyzing several microarray datasets which were independently carried out with the same biological objectives, could turn out to be different. One of the main reasons is attributable to the limited number of samples involved in one microarry experiment. In order to increase the classification accuracy, it is desirable to augment the sample size by integrating and maximizing the use of independently-conducted microarray datasets. In this paper, we propose a novel two-stage approach which firstly integrates individual microarray datasets to overcome the problem caused by limited number of samples, and identifies informative genes, secondly builds a classifier using only the informative genes. The classifier from large samples by integrating independent microarray datasets achieves high accuracy up to 24.19% increase as against other comparison methods, sensitivity, and specificity on independent test sample dataset.

Development of Dataset Evaluation Criteria for Learning Deepfake Video (딥페이크 영상 학습을 위한 데이터셋 평가기준 개발)

  • Kim, Rayng-Hyung;Kim, Tae-Gu
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.193-207
    • /
    • 2021
  • As Deepfakes phenomenon is spreading worldwide mainly through videos in web platforms and it is urgent to address the issue on time. More recently, researchers have extensively discussed deepfake video datasets. However, it has been pointed out that the existing Deepfake datasets do not properly reflect the potential threat and realism due to various limitations. Although there is a need for research that establishes an agreed-upon concept for high-quality datasets or suggests evaluation criterion, there are still handful studies which examined it to-date. Therefore, this study focused on the development of the evaluation criterion for the Deepfake video dataset. In this study, the fitness of the Deepfake dataset was presented and evaluation criterions were derived through the review of previous studies. AHP structuralization and analysis were performed to advance the evaluation criterion. The results showed that Facial Expression, Validation, and Data Characteristics are important determinants of data quality. This is interpreted as a result that reflects the importance of minimizing defects and presenting results based on scientific methods when evaluating quality. This study has implications in that it suggests the fitness and evaluation criterion of the Deepfake dataset. Since the evaluation criterion presented in this study was derived based on the items considered in previous studies, it is thought that all evaluation criterions will be effective for quality improvement. It is also expected to be used as criteria for selecting an appropriate deefake dataset or as a reference for designing a Deepfake data benchmark. This study could not apply the presented evaluation criterion to existing Deepfake datasets. In future research, the proposed evaluation criterion will be applied to existing datasets to evaluate the strengths and weaknesses of each dataset, and to consider what implications there will be when used in Deepfake research.

A Study on the Land Cover Classification and Cross Validation of AI-based Aerial Photograph

  • Lee, Seong-Hyeok;Myeong, Soojeong;Yoon, Donghyeon;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.4
    • /
    • pp.395-409
    • /
    • 2022
  • The purpose of this study is to evaluate the classification performance and applicability when land cover datasets constructed for AI training are cross validation to other areas. For study areas, Gyeongsang-do and Jeolla-do in South Korea were selected as cross validation areas, and training datasets were obtained from AI-Hub. The obtained datasets were applied to the U-Net algorithm, a semantic segmentation algorithm, for each region, and the accuracy was evaluated by applying them to the same and other test areas. There was a difference of about 13-15% in overall classification accuracy between the same and other areas. For rice field, fields and buildings, higher accuracy was shown in the Jeolla-do test areas. For roads, higher accuracy was shown in the Gyeongsang-do test areas. In terms of the difference in accuracy by weight, the result of applying the weights of Gyeongsang-do showed high accuracy for forests, while that of applying the weights of Jeolla-do showed high accuracy for dry fields. The result of land cover classification, it was found that there is a difference in classification performance of existing datasets depending on area. When constructing land cover map for AI training, it is expected that higher quality datasets can be constructed by reflecting the characteristics of various areas. This study is highly scalable from two perspectives. First, it is to apply satellite images to AI study and to the field of land cover. Second, it is expanded based on satellite images and it is possible to use a large scale area and difficult to access.

Research Trends and Datasets Review using Satellite Image (위성영상 이미지를 활용한 연구 동향 및 데이터셋 리뷰)

  • Kim, Se Hyoung;Chae, Jung Woo;Kang, Ju Young
    • Smart Media Journal
    • /
    • v.11 no.1
    • /
    • pp.17-30
    • /
    • 2022
  • Like other computer vision research trends, research using satellite images was able to achieve rapid growth with the development of GPU-based computer computing capabilities and deep learning methodologies related to image processing. As a result, satellite images are being used in various fields, and the number of studies on how to use satellite images is increasing. Therefore, in this paper, we will introduce the field of research and utilization of satellite images and datasets that can be used for research using satellite images. First, studies using satellite images were collected and classified according to the research method. It was largely classified into a Regression-based Approach and a Classification-based Approach, and the papers used by other methods were summarized. Next, the datasets used in studies using satellite images were summarized. This study proposes information on datasets and methods of use in research. In addition, it introduces how to organize and utilize domestic satellite image datasets that were recently opened by AI hub. In addition, I would like to briefly examine the limitations of satellite image-related research and future trends.

A Study on the Description of Archival Datasets (데이터세트 기록물의 기술요소에 관한 연구)

  • Kim, Po-Ok;Yun, Soo-Young
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.18 no.2
    • /
    • pp.39-59
    • /
    • 2007
  • With the rapid spread of the practice of collecting and treating data by using a data base system, it's increasingly more critical to approach data sets in the same manner as general records in the collection, evaluation, preservation, and utilization process. Despite the importance, however, the interest level in data sets in Korea's records management is very low. In order to suggest basic items to regard data sets as records and manage them systematically, this study examined the descriptive elements of data set records. descriptive elements of data set records were suggested by comparing and analyzing those ones adopted by the agencies that regarded data sets as records and provided the concerned service as well as the descriptive rules of electronic records set by the advanced nations in records management based on the descriptive areas of ISAD(G).

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim;Jiwon Kim;Ji Won Cho;Kwang-Sung Ahn;Dong-Il Park;Sangsoo Kim
    • Genomics & Informatics
    • /
    • v.21 no.3
    • /
    • pp.40.1-40.11
    • /
    • 2023
  • Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Semantic Segmentation of Drone Images Based on Combined Segmentation Network Using Multiple Open Datasets (개방형 다중 데이터셋을 활용한 Combined Segmentation Network 기반 드론 영상의 의미론적 분할)

  • Ahram Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.967-978
    • /
    • 2023
  • This study proposed and validated a combined segmentation network (CSN) designed to effectively train on multiple drone image datasets and enhance the accuracy of semantic segmentation. CSN shares the entire encoding domain to accommodate the diversity of three drone datasets, while the decoding domains are trained independently. During training, the segmentation accuracy of CSN was lower compared to U-Net and the pyramid scene parsing network (PSPNet) on single datasets because it considers loss values for all dataset simultaneously. However, when applied to domestic autonomous drone images, CSN demonstrated the ability to classify pixels into appropriate classes without requiring additional training, outperforming PSPNet. This research suggests that CSN can serve as a valuable tool for effectively training on diverse drone image datasets and improving object recognition accuracy in new regions.