• 제목/요약/키워드: datasets

검색결과 2,050건 처리시간 0.026초

딥러닝 기반 민화 장르 분류 모델 연구 (A Study on the Classification Model of Minhwa Genre Based on Deep Learning)

  • 윤수림;이영숙
    • 한국멀티미디어학회논문지
    • /
    • 제25권10호
    • /
    • pp.1524-1534
    • /
    • 2022
  • This study proposes the classification model of Minhwa genre based on object detection of deep learning. To detect unique Korean traditional objects in Minhwa, we construct custom datasets by labeling images using object keywords in Minhwa DB. We train YOLOv5 models with custom datasets, and classify images using predicted object labels result, the output of model training. The algorithm consists of two classification steps: 1) according to the painting technique and 2) genre of Minhwa. Through classifying paintings using this algorithm on the Internet, it is expected that the correct information of Minhwa can be built and provided to users forward.

Integrative Multi-Omics Approaches in Cancer Research: From Biological Networks to Clinical Subtypes

  • Heo, Yong Jin;Hwa, Chanwoong;Lee, Gang-Hee;Park, Jae-Min;An, Joon-Yong
    • Molecules and Cells
    • /
    • 제44권7호
    • /
    • pp.433-443
    • /
    • 2021
  • Multi-omics approaches are novel frameworks that integrate multiple omics datasets generated from the same patients to better understand the molecular and clinical features of cancers. A wide range of emerging omics and multi-view clustering algorithms now provide unprecedented opportunities to further classify cancers into subtypes, improve the survival prediction and therapeutic outcome of these subtypes, and understand key pathophysiological processes through different molecular layers. In this review, we overview the concept and rationale of multi-omics approaches in cancer research. We also introduce recent advances in the development of multi-omics algorithms and integration methods for multiple-layered datasets from cancer patients. Finally, we summarize the latest findings from large-scale multi-omics studies of various cancers and their implications for patient subtyping and drug development.

비정형 야지환경 주행상황에서의 실시간 의미론적 영상 분할 알고리즘 성능 향상에 관한 연구 (A Study of Real-time Semantic Segmentation Performance Improvement in Unstructured Outdoor Environment)

  • 김대영;안승욱;서승우
    • 한국군사과학기술학회지
    • /
    • 제25권6호
    • /
    • pp.606-616
    • /
    • 2022
  • Semantic segmentation in autonomous driving for unstructured environments is challenging due to the presence of uneven terrains, unstructured class boundaries, irregular features and strong textures. Current off-road datasets exhibit difficulties like class imbalance and understanding of varying environmental topography. To overcome these issues, we propose a deep learning framework for semantic segmentation that involves a pooled class semantic segmentation with five classes. The evaluation of the framework is carried out on two off-road driving datasets, RUGD and TAS500. The results show that our proposed method achieves high accuracy and real-time performance.

Density-based Outlier Detection in Multi-dimensional Datasets

  • Wang, Xite;Cao, Zhixin;Zhan, Rongjuan;Bai, Mei;Ma, Qian;Li, Guanyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권12호
    • /
    • pp.3815-3835
    • /
    • 2022
  • Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as LOFmin. Secondly, the outliers can filtered out by LOFmin, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.

Performance Improvement of Fuzzy C-Means Clustering Algorithm by Optimized Early Stopping for Inhomogeneous Datasets

  • Chae-Rim Han;Sun-Jin Lee;Il-Gu Lee
    • Journal of information and communication convergence engineering
    • /
    • 제21권3호
    • /
    • pp.198-207
    • /
    • 2023
  • Responding to changes in artificial intelligence models and the data environment is crucial for increasing data-learning accuracy and inference stability of industrial applications. A learning model that is overfitted to specific training data leads to poor learning performance and a deterioration in flexibility. Therefore, an early stopping technique is used to stop learning at an appropriate time. However, this technique does not consider the homogeneity and independence of the data collected by heterogeneous nodes in a differential network environment, thus resulting in low learning accuracy and degradation of system performance. In this study, the generalization performance of neural networks is maximized, whereas the effect of the homogeneity of datasets is minimized by achieving an accuracy of 99.7%. This corresponds to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.

A surrogate model for the helium production rate in fast reactor MOX fuels

  • D. Pizzocri;M.G. Katsampiris;L. Luzzi;A. Magni;G. Zullo
    • Nuclear Engineering and Technology
    • /
    • 제55권8호
    • /
    • pp.3071-3079
    • /
    • 2023
  • Helium production in the nuclear fuel matrix during irradiation plays a critical role in the design and performance of Gen-IV reactor fuel, as it represents a life-limiting factor for the operation of fuel pins. In this work, a surrogate model for the helium production rate in fast reactor MOX fuels is developed, targeting its inclusion in engineering tools such as fuel performance codes. This surrogate model is based on synthetic datasets obtained via the SCIANTIX burnup module. Such datasets are generated using Latin hypercube sampling to cover the range of input parameters (e.g., fuel initial composition, fission rate density, and irradiation time) and exploiting the low computation requirement of the burnup module itself. The surrogate model is verified against the SCIANTIX burnup module results for helium production with satisfactory performance.

Saliency-Assisted Collaborative Learning Network for Road Scene Semantic Segmentation

  • Haifeng Sima;Yushuang Xu;Minmin Du;Meng Gao;Jing Wang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권3호
    • /
    • pp.861-880
    • /
    • 2023
  • Semantic segmentation of road scene is the key technology of autonomous driving, and the improvement of convolutional neural network architecture promotes the improvement of model segmentation performance. The existing convolutional neural network has the simplification of learning knowledge and the complexity of the model. To address this issue, we proposed a road scene semantic segmentation algorithm based on multi-task collaborative learning. Firstly, a depthwise separable convolution atrous spatial pyramid pooling is proposed to reduce model complexity. Secondly, a collaborative learning framework is proposed involved with saliency detection, and the joint loss function is defined using homoscedastic uncertainty to meet the new learning model. Experiments are conducted on the road and nature scenes datasets. The proposed method achieves 70.94% and 64.90% mIoU on Cityscapes and PASCAL VOC 2012 datasets, respectively. Qualitatively, Compared to methods with excellent performance, the method proposed in this paper has significant advantages in the segmentation of fine targets and boundaries.

대형 이미지 데이터셋 구축을 위한 이미지 이진화 기반 데이터 증강 기법 (Data augmentation technique based on image binarization for constructing large-scale datasets)

  • 이주혁;김미희
    • 전기전자학회논문지
    • /
    • 제27권1호
    • /
    • pp.59-64
    • /
    • 2023
  • 딥러닝은 다양한 컴퓨터 비전 문제를 해결할 수 있지만, 대량의 데이터셋이 필요하다. 본 논문에서는 대형 이미지 데이터셋을 구축하기 위해 이미지 이진화 기반 데이터 증강 기법을 제안한다. 이미지 이진화를 사용하여 특성을 추출하고 추출된 나머지 픽셀을 랜덤하게 배치하여 새로운 이미지를 생성한다. 생성된 이미지는 원본 이미지와 유사한 품질을 보여주며, 딥러닝 모델에서도 뛰어난 성능을 보였다.

Recent deep learning methods for tabular data

  • Yejin Hwang;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권2호
    • /
    • pp.215-226
    • /
    • 2023
  • Deep learning has made great strides in the field of unstructured data such as text, images, and audio. However, in the case of tabular data analysis, machine learning algorithms such as ensemble methods are still better than deep learning. To keep up with the performance of machine learning algorithms with good predictive power, several deep learning methods for tabular data have been proposed recently. In this paper, we review the latest deep learning models for tabular data and compare the performances of these models using several datasets. In addition, we also compare the latest boosting methods to these deep learning methods and suggest the guidelines to the users, who analyze tabular datasets. In regression, machine learning methods are better than deep learning methods. But for the classification problems, deep learning methods perform better than the machine learning methods in some cases.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.