• Title/Summary/Keyword: datasets

검색결과 1,986건 처리시간 0.18초

Evaluation of the Troposphere Ozone in the Reanalysis Datasets: Comparison with Pohang Ozonesonde Observation (대류권 오존 재분석 자료의 품질 검증: 포항 오존존데와 비교 검증)

  • Park, Jinkyung;Kim, Seo-Yeon;Son, Seok-Woo
    • Atmosphere
    • /
    • 제29권1호
    • /
    • pp.53-59
    • /
    • 2019
  • The quality of troposphere ozone in three reanalysis datasets is evaluated with longterm ozonesonde measurement at Pohang, South Korea. The Monitoring Atmospheric Composition and Climate (MACC), European Centre for Medium-Range Weather Forecasts Interim Reanalysis (ERAI) and Modern Era Retrospective-Analysis for Research and Applications version 2 (MERRA2) are particularly examined in terms of the vertical ozone structure, seasonality and long-term trend in the lower troposphere. It turns out that MACC shows the smallest biases in the ozone profile, and has realistic seasonality of lower-tropospheric ozone concentration with a maximum ozone mixing ratio in spring and early summer and minimum in winter. MERRA2 also shows reasonably small biases. However, ERAI exhibits significant biases with substantially lower ozone mixing ratio in most seasons, except in mid summer, than the observation. It even fails to reproduce the seasonal cycle of lower-tropospheric ozone concentration. This result suggests that great caution is needed when analyzing tropospheric ozone using ERAI data. It is further found that, although not statistically significant, all datasets consistently show a decreasing trend of 850-hPa ozone concentration since 2003 as in the observation.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

A Design of Small Scale Deep CNN Model for Facial Expression Recognition using the Low Resolution Image Datasets (저해상도 영상 자료를 사용하는 얼굴 표정 인식을 위한 소규모 심층 합성곱 신경망 모델 설계)

  • Salimov, Sirojiddin;Yoo, Jae Hung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • 제16권1호
    • /
    • pp.75-80
    • /
    • 2021
  • Artificial intelligence is becoming an important part of our lives providing incredible benefits. In this respect, facial expression recognition has been one of the hot topics among computer vision researchers in recent decades. Classifying small dataset of low resolution images requires the development of a new small scale deep CNN model. To do this, we propose a method suitable for small datasets. Compared to the traditional deep CNN models, this model uses only a fraction of the memory in terms of total learnable weights, but it shows very similar results for the FER2013 and FERPlus datasets.

No-Reference Image Quality Assessment based on Quality Awareness Feature and Multi-task Training

  • Lai, Lijing;Chu, Jun;Leng, Lu
    • Journal of Multimedia Information System
    • /
    • 제9권2호
    • /
    • pp.75-86
    • /
    • 2022
  • The existing image quality assessment (IQA) datasets have a small number of samples. Some methods based on transfer learning or data augmentation cannot make good use of image quality-related features. A No Reference (NR)-IQA method based on multi-task training and quality awareness is proposed. First, single or multiple distortion types and levels are imposed on the original image, and different strategies are used to augment different types of distortion datasets. With the idea of weak supervision, we use the Full Reference (FR)-IQA methods to obtain the pseudo-score label of the generated image. Then, we combine the classification information of the distortion type, level, and the information of the image quality score. The ResNet50 network is trained in the pre-train stage on the augmented dataset to obtain more quality-aware pre-training weights. Finally, the fine-tuning stage training is performed on the target IQA dataset using the quality-aware weights to predicate the final prediction score. Various experiments designed on the synthetic distortions and authentic distortions datasets (LIVE, CSIQ, TID2013, LIVEC, KonIQ-10K) prove that the proposed method can utilize the image quality-related features better than the method using only single-task training. The extracted quality-aware features improve the accuracy of the model.

A biomedically oriented automatically annotated Twitter COVID-19 dataset

  • Hernandez, Luis Alberto Robles;Callahan, Tiffany J.;Banda, Juan M.
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.21.1-21.5
    • /
    • 2021
  • The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

A Novel Framework Based on CNN-LSTM Neural Network for Prediction of Missing Values in Electricity Consumption Time-Series Datasets

  • Hussain, Syed Nazir;Aziz, Azlan Abd;Hossen, Md. Jakir;Aziz, Nor Azlina Ab;Murthy, G. Ramana;Mustakim, Fajaruddin Bin
    • Journal of Information Processing Systems
    • /
    • 제18권1호
    • /
    • pp.115-129
    • /
    • 2022
  • Adopting Internet of Things (IoT)-based technologies in smart homes helps users analyze home appliances electricity consumption for better overall cost monitoring. The IoT application like smart home system (SHS) could suffer from large missing values gaps due to several factors such as security attacks, sensor faults, or connection errors. In this paper, a novel framework has been proposed to predict large gaps of missing values from the SHS home appliances electricity consumption time-series datasets. The framework follows a series of steps to detect, predict and reconstruct the input time-series datasets of missing values. A hybrid convolutional neural network-long short term memory (CNN-LSTM) neural network used to forecast large missing values gaps. A comparative experiment has been conducted to evaluate the performance of hybrid CNN-LSTM with its single variant CNN and LSTM in forecasting missing values. The experimental results indicate a performance superiority of the CNN-LSTM model over the single CNN and LSTM neural networks.

HiGANCNN: A Hybrid Generative Adversarial Network and Convolutional Neural Network for Glaucoma Detection

  • Alsulami, Fairouz;Alseleahbi, Hind;Alsaedi, Rawan;Almaghdawi, Rasha;Alafif, Tarik;Ikram, Mohammad;Zong, Weiwei;Alzahrani, Yahya;Bawazeer, Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • 제22권9호
    • /
    • pp.23-30
    • /
    • 2022
  • Glaucoma is a chronic neuropathy that affects the optic nerve which can lead to blindness. The detection and prediction of glaucoma become possible using deep neural networks. However, the detection performance relies on the availability of a large number of data. Therefore, we propose different frameworks, including a hybrid of a generative adversarial network and a convolutional neural network to automate and increase the performance of glaucoma detection. The proposed frameworks are evaluated using five public glaucoma datasets. The framework which uses a Deconvolutional Generative Adversarial Network (DCGAN) and a DenseNet pre-trained model achieves 99.6%, 99.08%, 99.4%, 98.69%, and 92.95% of classification accuracy on RIMONE, Drishti-GS, ACRIMA, ORIGA-light, and HRF datasets respectively. Based on the experimental results and evaluation, the proposed framework closely competes with the state-of-the-art methods using the five public glaucoma datasets without requiring any manually preprocessing step.

CDOWatcher: Systematic, Data-driven Platform for Early Detection of Contagious Diseases Outbreaks

  • Albarrak, Abdullah M.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권11호
    • /
    • pp.77-86
    • /
    • 2022
  • The destructive impact of contagious diseases outbreaks on all life facets necessitates developing effective solutions to control these diseases outbreaks. This research proposes an end-to-end, data-driven platform which consists of multiple modules that are working in harmony to achieve a concrete goal: early detection of contagious diseases outbreaks (i.e., epidemic diseases detection). Achieving that goal enables decision makers and people in power to act promptly, resulting in robust prevention management of contagious diseases. It must be clear that the goal of this proposed platform is not to predict or forecast the spread of contagious diseases, rather, its goal is to promptly detect contagious diseases outbreaks as they happen. The front end of the proposed platform is a web-based dashboard that visualizes diseases outbreaks in real-time on a real map. These outbreaks are detected via another component of the platform which utilizes data mining techniques and algorithms on gathered datasets. Those gathered datasets are managed by yet another component. Specifically, a mobile application will be the main source of data to the platform. Being a vital component of the platform, the datasets are managed by a DBMS that is specifically tailored for this platform. Preliminary results are presented to showcase the performance of a prototype of the proposed platform.

Identification of Combined Biomarker for Predicting Alzheimer's Disease Using Machine Learning

  • Ki-Yeol Kim
    • Korean Journal of Biological Psychiatry
    • /
    • 제30권1호
    • /
    • pp.24-30
    • /
    • 2023
  • Objectives Alzheimer's disease (AD) is the most common form of dementia in older adults, damaging the brain and resulting in impaired memory, thinking, and behavior. The identification of differentially expressed genes and related pathways among affected brain regions can provide more information on the mechanisms of AD. The aim of our study was to identify differentially expressed genes associated with AD and combined biomarkers among them to improve AD risk prediction accuracy. Methods Machine learning methods were used to compare the performance of the identified combined biomarkers. In this study, three publicly available gene expression datasets from the hippocampal brain region were used. Results We detected 31 significant common genes from two different microarray datasets using the limma package. Some of them belonged to 11 biological pathways. Combined biomarkers were identified in two microarray datasets and were evaluated in a different dataset. The performance of the predictive models using the combined biomarkers was superior to those of models using a single gene. When two genes were combined, the most predictive gene set in the evaluation dataset was ATR and PRKCB when linear discriminant analysis was applied. Conclusions Combined biomarkers showed good performance in predicting the risk of AD. The constructed predictive nomogram using combined biomarkers could easily be used by clinicians to identify high-risk individuals so that more efficient trials could be designed to reduce the incidence of AD.

Artificial Neural Network Supported Prediction of Magnetic Properties of Bulk Metallic Glasses (인공신경망을 이용한 벌크 비정질 합금 소재의 포화자속밀도 예측 성능평가)

  • Chunghee Nam
    • Korean Journal of Materials Research
    • /
    • 제33권7호
    • /
    • pp.273-278
    • /
    • 2023
  • In this study, based on the saturation magnetic flux density experimental values (Bs) of 622 Fe-based bulk metallic glasses (BMGs), regression models were applied to predict Bs using artificial neural networks (ANN), and prediction performance was evaluated. Model performance evaluation was investigated by using the F1 score together with the coefficient of determination (R2 score), which is mainly used in regression models. The coefficient of determination can be used as a performance indicator, since it shows the predicted results of the saturation magnetic flux density of full material datasets in a balanced way. However, the BMG alloy contains iron and requires a high saturation magnetic flux density to have excellent applicability as a soft magnetic material, and in this study F1 score was used as a performance indicator to better predict Bs above the threshold value of Bs (1.4 T). After obtaining two ANN models optimized for the R2 and F1 score conditions, respectively, their prediction performance was compared for the test data. As a case study to evaluate the prediction performance, new Fe-based BMG datasets that were not included in the training and test datasets were predicted using the two ANN models. The results showed that the model with an excellent F1 score achieved a more accurate prediction for a material with a high saturation magnetic flux density.