• Title/Summary/Keyword: Dataset Quality

Search Result 414, Processing Time 0.03 seconds

A Measure for Improvement in Quality of Association Rules in the Item Response Dataset (문항 응답 데이터에서 문항간 연관규칙의 질적 향상을 위한 도구 개발)

  • Kwak, Eun-Young;Kim, Hyeoncheol
    • The Journal of Korean Association of Computer Education
    • /
    • v.10 no.3
    • /
    • pp.1-8
    • /
    • 2007
  • In this paper, we introduce a new measure called surprisal that estimates the informativeness of transactional instances and attributes in the item response dataset and improve the quality of association rules. In order to this, we set artificial dataset and eliminate noisy and uninformative data using the surprisal first, and then generate association rules between items. And we compare the association rules from the dataset after surprisal-based pruning with support-based pruning and original dataset unpruned. Experimental result that the surprisal-based pruning improves quality of association rules in question item response datasets significantly.

  • PDF

A Multi-category Task for Bitrate Interval Prediction with the Target Perceptual Quality

  • Yang, Zhenwei;Shen, Liquan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4476-4491
    • /
    • 2021
  • Video service providers tend to face user network problems in the process of transmitting video streams. They strive to provide user with superior video quality in a limited bitrate environment. It is necessary to accurately determine the target bitrate range of the video under different quality requirements. Recently, several schemes have been proposed to meet this requirement. However, they do not take the impact of visual influence into account. In this paper, we propose a new multi-category model to accurately predict the target bitrate range with target visual quality by machine learning. Firstly, a dataset is constructed to generate multi-category models by machine learning. The quality score ladders and the corresponding bitrate-interval categories are defined in the dataset. Secondly, several types of spatial-temporal features related to VMAF evaluation metrics and visual factors are extracted and processed statistically for classification. Finally, bitrate prediction models trained on the dataset by RandomForest classifier can be used to accurately predict the target bitrate of the input videos with target video quality. The classification prediction accuracy of the model reaches 0.705 and the encoded video which is compressed by the bitrate predicted by the model can achieve the target perceptual quality.

Multi Modal Sensor Training Dataset for the Robust Object Detection and Tracking in Outdoor Surveillance (MMO (Multi Modal Outdoor) Dataset) (실외 경비 환경에서 강인한 객체 검출 및 추적을 위한 실외 멀티 모달 센서 기반 학습용 데이터베이스 구축)

  • Noh, DongKi;Yang, Wonkeun;Uhm, Teayoung;Lee, Jaekwang;Kim, Hyoung-Rock;Baek, SeungMin
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.1006-1018
    • /
    • 2020
  • Dataset is getting more import to develop a learning based algorithm. Quality of the algorithm definitely depends on dataset. So we introduce new dataset over 200 thousands images which are fully labeled multi modal sensor data. Proposed dataset was designed and constructed for researchers who want to develop detection, tracking, and action classification in outdoor environment for surveillance scenarios. The dataset includes various images and multi modal sensor data under different weather and lighting condition. Therefor, we hope it will be very helpful to develop more robust algorithm for systems equipped with difference kinds of sensors in outdoor application. Case studies with the proposed dataset are also discussed in this paper.

Comparison of image quality according to activation function during Super Resolution using ESCPN (ESCPN을 이용한 초해상화 시 활성화 함수에 따른 이미지 품질의 비교)

  • Song, Moon-Hyuk;Song, Ju-Myung;Hong, Yeon-Jo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.129-132
    • /
    • 2022
  • Super-resolution is the process of converting a low-quality image into a high-quality image. This study was conducted using ESPCN. In a super-resolution deep neural network, different quality images can be output even when receiving the same input data according to the activation function that determines the weight when passing through each node. Therefore, the purpose of this study is to find the most suitable activation function for super-resolution by applying the activation functions ReLU, ELU, and Swish and compare the quality of the output image for the same input images. The CelebaA Dataset was used as the dataset. Images were cut into a square during the pre-processing process then the image quality was lowered. The degraded image was used as the input image and the original image was used for evaluation. As a result, ELU and swish took a long time to train compared to ReLU, which is mainly used for machine learning but showed better performance.

  • PDF

A Construction of Geographical Distance-based Air Quality Dataset Using Hospital Location Information (병원위치정보를 이용한 지리적 거리기반의 대기환경 데이터셋 구축)

  • Kim, Hyeongsoo;Ryu, Keun Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.3
    • /
    • pp.231-242
    • /
    • 2016
  • As of late, air quality information has been actively gathered and investigated in order to find possible environmental risk factors that may affect the onset of cardiovascular disease. Nevertheless, existing studies are limited in the detailed analysis because they take advantage of the air quality information of the macro statistics divided into administrative districts. This paper proposes the construction of distance-based air quality dataset using a domestic hospital’s geographical location information as a reliable data gathering step for a more detailed analysis of environmental risk factors. For the construction of the dataset, air quality information was obtained by utilizing the geographical location of a hospital—in which a patient with cardiovascular disease had been admitted—and then matching the hospital with a meteorological and air pollution station in its vicinity. An air quality acquisition system based on GMap.net was devised for the purpose of data gathering and visualization. The reliability of the experiment was confirmed by evaluating the matching rate and error of air quality values between the acquired dataset with existing area-based air quality datasets from matched distances. Therefore, this dataset, which considers geographical information, can be utilized in multidisciplinary research for the discovery of environmental risk factors that can affect not only cardiovascular diseases but also potentially other epidemic diseases.

An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation (가우시안 기반 Hyper-Rectangle 생성을 이용한 효율적 단일 분류기)

  • Kim, Do Gyun;Choi, Jin Young;Ko, Jeonghan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.56-64
    • /
    • 2018
  • In recent years, imbalanced data is one of the most important and frequent issue for quality control in industrial field. As an example, defect rate has been drastically reduced thanks to highly developed technology and quality management, so that only few defective data can be obtained from production process. Therefore, quality classification should be performed under the condition that one class (defective dataset) is even smaller than the other class (good dataset). However, traditional multi-class classification methods are not appropriate to deal with such an imbalanced dataset, since they classify data from the difference between one class and the others that can hardly be found in imbalanced datasets. Thus, one-class classification that thoroughly learns patterns of target class is more suitable for imbalanced dataset since it only focuses on data in a target class. So far, several one-class classification methods such as one-class support vector machine, neural network and decision tree there have been suggested. One-class support vector machine and neural network can guarantee good classification rate, and decision tree can provide a set of rules that can be clearly interpreted. However, the classifiers obtained from the former two methods consist of complex mathematical functions and cannot be easily understood by users. In case of decision tree, the criterion for rule generation is ambiguous. Therefore, as an alternative, a new one-class classifier using hyper-rectangles was proposed, which performs precise classification compared to other methods and generates rules clearly understood by users as well. In this paper, we suggest an approach for improving the limitations of those previous one-class classification algorithms. Specifically, the suggested approach produces more improved one-class classifier using hyper-rectangles generated by using Gaussian function. The performance of the suggested algorithm is verified by a numerical experiment, which uses several datasets in UCI machine learning repository.

A Study on Record Selection Strategy and Procedure in Dataset for Administrative Information (행정정보 데이터세트 기록의 선별 기준 및 절차 연구)

  • Cho, Eun-Hee;Yim, Jin-Hee
    • The Korean Journal of Archival Studies
    • /
    • no.19
    • /
    • pp.251-291
    • /
    • 2009
  • Due to the trend toward computerization of business services in public sector and the push for e-government, the volume of records that are produced in electronic system and the types of records vary as well. Of those types, dataset is attracting everyone's attention because it is rapidly being supplied. Even though the administrative information system stipulated as an electronic record production system is increasing in number, as it is in blind spot for records management, the system can be superannuated or the records can be lost in case new system is developed. In addition, the system was designed not considering records management, it is managed in an unsatisfactory state because of not meeting the features and quality requirements as records management system. In the advanced countries, they recognized the importance of dataset and then managed the archives for dataset and carried out the project on management systems and a preservation formats for keeping data. Korea also is carrying out the researches on an dataset and individual administrative information systems, but the official scheme has not been established yet. In this study the items for managing archives which should be reflected when the administrative information system is designed was offered in two respects - an identification method and a quality requirement. The major directions for this system are as follows. First, as the dataset is a kind of an electronic record, it is necessary to reflect this factor from the design step prior to production. Second, the system should be established integrating the strategy for records management to the information strategy for the whole organization. In this study, based on such two directions the strategies to establish the identification for dataset in a frame to push e-government were suggested. The problem on the archiving steps including preservation format and the management procedures in dataset archive does not included in the research contents. In line with this, more researches on those contents as well as a variety of researches on dataset are expected to be more actively conducted.

No-Reference Image Quality Assessment based on Quality Awareness Feature and Multi-task Training

  • Lai, Lijing;Chu, Jun;Leng, Lu
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.75-86
    • /
    • 2022
  • The existing image quality assessment (IQA) datasets have a small number of samples. Some methods based on transfer learning or data augmentation cannot make good use of image quality-related features. A No Reference (NR)-IQA method based on multi-task training and quality awareness is proposed. First, single or multiple distortion types and levels are imposed on the original image, and different strategies are used to augment different types of distortion datasets. With the idea of weak supervision, we use the Full Reference (FR)-IQA methods to obtain the pseudo-score label of the generated image. Then, we combine the classification information of the distortion type, level, and the information of the image quality score. The ResNet50 network is trained in the pre-train stage on the augmented dataset to obtain more quality-aware pre-training weights. Finally, the fine-tuning stage training is performed on the target IQA dataset using the quality-aware weights to predicate the final prediction score. Various experiments designed on the synthetic distortions and authentic distortions datasets (LIVE, CSIQ, TID2013, LIVEC, KonIQ-10K) prove that the proposed method can utilize the image quality-related features better than the method using only single-task training. The extracted quality-aware features improve the accuracy of the model.

Exploiting Neural Network for Temporal Multi-variate Air Quality and Pollutant Prediction

  • Khan, Muneeb A.;Kim, Hyun-chul;Park, Heemin
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.440-449
    • /
    • 2022
  • In recent years, the air pollution and Air Quality Index (AQI) has been a pivotal point for researchers due to its effect on human health. Various research has been done in predicting the AQI but most of these studies, either lack dense temporal data or cover one or two air pollutant elements. In this paper, a hybrid Convolutional Neural approach integrated with recurrent neural network architecture (CNN-LSTM), is presented to find air pollution inference using a multivariate air pollutant elements dataset. The aim of this research is to design a robust and real-time air pollutant forecasting system by exploiting a neural network. The proposed approach is implemented on a 24-month dataset from Seoul, Republic of Korea. The predicted results are cross-validated with the real dataset and compared with the state-of-the-art techniques to evaluate its robustness and performance. The proposed model outperforms SVM, SVM-Polynomial, ANN, and RF models with 60.17%, 68.99%, 14.6%, and 6.29%, respectively. The model performs SVM and SVM-Polynomial in predicting O3 by 78.04% and 83.79%, respectively. Overall performance of the model is measured in terms of Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE).

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.