• 제목/요약/키워드: Datasets

검색결과 2,085건 처리시간 0.024초

Identification of Combined Biomarker for Predicting Alzheimer's Disease Using Machine Learning

  • Ki-Yeol Kim
    • 생물정신의학
    • /
    • 제30권1호
    • /
    • pp.24-30
    • /
    • 2023
  • Objectives Alzheimer's disease (AD) is the most common form of dementia in older adults, damaging the brain and resulting in impaired memory, thinking, and behavior. The identification of differentially expressed genes and related pathways among affected brain regions can provide more information on the mechanisms of AD. The aim of our study was to identify differentially expressed genes associated with AD and combined biomarkers among them to improve AD risk prediction accuracy. Methods Machine learning methods were used to compare the performance of the identified combined biomarkers. In this study, three publicly available gene expression datasets from the hippocampal brain region were used. Results We detected 31 significant common genes from two different microarray datasets using the limma package. Some of them belonged to 11 biological pathways. Combined biomarkers were identified in two microarray datasets and were evaluated in a different dataset. The performance of the predictive models using the combined biomarkers was superior to those of models using a single gene. When two genes were combined, the most predictive gene set in the evaluation dataset was ATR and PRKCB when linear discriminant analysis was applied. Conclusions Combined biomarkers showed good performance in predicting the risk of AD. The constructed predictive nomogram using combined biomarkers could easily be used by clinicians to identify high-risk individuals so that more efficient trials could be designed to reduce the incidence of AD.

인공신경망을 이용한 벌크 비정질 합금 소재의 포화자속밀도 예측 성능평가 (Artificial Neural Network Supported Prediction of Magnetic Properties of Bulk Metallic Glasses)

  • 남충희
    • 한국재료학회지
    • /
    • 제33권7호
    • /
    • pp.273-278
    • /
    • 2023
  • In this study, based on the saturation magnetic flux density experimental values (Bs) of 622 Fe-based bulk metallic glasses (BMGs), regression models were applied to predict Bs using artificial neural networks (ANN), and prediction performance was evaluated. Model performance evaluation was investigated by using the F1 score together with the coefficient of determination (R2 score), which is mainly used in regression models. The coefficient of determination can be used as a performance indicator, since it shows the predicted results of the saturation magnetic flux density of full material datasets in a balanced way. However, the BMG alloy contains iron and requires a high saturation magnetic flux density to have excellent applicability as a soft magnetic material, and in this study F1 score was used as a performance indicator to better predict Bs above the threshold value of Bs (1.4 T). After obtaining two ANN models optimized for the R2 and F1 score conditions, respectively, their prediction performance was compared for the test data. As a case study to evaluate the prediction performance, new Fe-based BMG datasets that were not included in the training and test datasets were predicted using the two ANN models. The results showed that the model with an excellent F1 score achieved a more accurate prediction for a material with a high saturation magnetic flux density.

이미지 캡셔닝 기반의 새로운 위험도 측정 모델 (A Novel Image Captioning based Risk Assessment Model)

  • 전민성;고재필;최경주
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제32권4호
    • /
    • pp.119-136
    • /
    • 2023
  • Purpose We introduce a groundbreaking surveillance system explicitly designed to overcome the limitations typically associated with conventional surveillance systems, which often focus primarily on object-centric behavior analysis. Design/methodology/approach The study introduces an innovative approach to risk assessment in surveillance, employing image captioning to generate descriptive captions that effectively encapsulate the interactions among objects, actions, and spatial elements within observed scenes. To support our methodology, we developed a distinctive dataset comprising pairs of [image-caption-danger score] for training purposes. We fine-tuned the BLIP-2 model using this dataset and utilized BERT to decipher the semantic content of the generated captions for assessing risk levels. Findings In a series of experiments conducted with our self-constructed datasets, we illustrate that these datasets offer a wealth of information for risk assessment and display outstanding performance in this area. In comparison to models pre-trained on established datasets, our generated captions thoroughly encompass the necessary object attributes, behaviors, and spatial context crucial for the surveillance system. Additionally, they showcase adaptability to novel sentence structures, ensuring their versatility across a range of contexts.

Slime mold and four other nature-inspired optimization algorithms in analyzing the concrete compressive strength

  • Yinghao Zhao;Hossein Moayedi;Loke Kok Foong;Quynh T. Thi
    • Smart Structures and Systems
    • /
    • 제33권1호
    • /
    • pp.65-91
    • /
    • 2024
  • The use of five optimization techniques for the prediction of a strength-based concrete mixture's best-fit model is examined in this work. Five optimization techniques are utilized for this purpose: Slime Mold Algorithm (SMA), Black Hole Algorithm (BHA), Multi-Verse Optimizer (MVO), Vortex Search (VS), and Whale Optimization Algorithm (WOA). MATLAB employs a hybrid learning strategy to train an artificial neural network that combines least square estimation with backpropagation. Thus, 72 samples are utilized as training datasets and 31 as testing datasets, totaling 103. The multi-layer perceptron (MLP) is used to analyze all data, and results are verified by comparison. For training datasets in the best-fit models of SMA-MLP, BHA-MLP, MVO-MLP, VS-MLP, and WOA-MLP, the statistical indices of coefficient of determination (R2) in training phase are 0.9603, 0.9679, 0.9827, 0.9841 and 0.9770, and in testing phase are 0.9567, 0.9552, 0.9594, 0.9888 and 0.9695 respectively. In addition, the best-fit structures for training for SMA, BHA, MVO, VS, and WOA (all combined with multilayer perceptron, MLP) are achieved when the term population size was modified to 450, 500, 250, 150, and 500, respectively. Among all the suggested options, VS could offer a stronger prediction network for training MLP.

Handwritten Indic Digit Recognition using Deep Hybrid Capsule Network

  • Mohammad Reduanul Haque;Rubaiya Hafiz;Mohammad Zahidul Islam;Mohammad Shorif Uddin
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.89-94
    • /
    • 2024
  • Indian subcontinent is a birthplace of multilingual people where documents such as job application form, passport, number plate identification, and so forth is composed of text contents written in different languages/scripts. These scripts may be in the form of different indic numerals in a single document page. Due to this reason, building a generic recognizer that is capable of recognizing handwritten indic digits written by diverse writers is needed. Also, a lot of work has been done for various non-Indic numerals particularly, in case of Roman, but, in case of Indic digits, the research is limited. Moreover, most of the research focuses with only on MNIST datasets or with only single datasets, either because of time restraints or because the model is tailored to a specific task. In this work, a hybrid model is proposed to recognize all available indic handwritten digit images using the existing benchmark datasets. The proposed method bridges the automatically learnt features of Capsule Network with hand crafted Bag of Feature (BoF) extraction method. Along the way, we analyze (1) the successes (2) explore whether this method will perform well on more difficult conditions i.e. noise, color, affine transformations, intra-class variation, natural scenes. Experimental results show that the hybrid method gives better accuracy in comparison with Capsule Network.

MixFace: Improving face verification with a focus on fine-grained conditions

  • Junuk Jung;Sungbin Son;Joochan Park;Yongjun Park;Seonhoon Lee;Heung-Seon Oh
    • ETRI Journal
    • /
    • 제46권4호
    • /
    • pp.660-670
    • /
    • 2024
  • The performance of face recognition (FR) has reached a plateau for public benchmark datasets, such as labeled faces in the wild (LFW), celebrities in frontal-profile in the wild (CFP-FP), and the first manually collected, in-the-wild age database (AgeDB), owing to the rapid advances in convolutional neural networks (CNNs). However, the effects of faces under various fine-grained conditions on FR models have not been investigated, owing to the absence of relevant datasets. This paper analyzes their effects under different conditions and loss functions using K-FACE, a recently introduced FR dataset with fine-grained conditions. We propose a novel loss function called MixFace, which combines classification and metric losses. The superiority of MixFace in terms of effectiveness and robustness was experimentally demonstrated using various benchmark datasets.

Machine learning-based evaluation technology of 3D spatial distribution of residual radioactivity in large-scale radioactive structures

  • UkJae Lee;Phillip Chang;Nam-Suk Jung;Jonghun Jang;Jimin Lee;Hee-Seock Lee
    • Nuclear Engineering and Technology
    • /
    • 제56권8호
    • /
    • pp.3199-3209
    • /
    • 2024
  • During the decommissioning of nuclear and particle accelerator facilities, a considerable amount of large-scale radioactive waste may be generated. Accurately defining the activation level of the waste is crucial for proper disposal. However, directly measuring the internal radioactivity distribution poses challenges. This study introduced a novel technology employing machine learning to assess the internal radioactivity distribution based on external measurements. Random radioactivity distribution within a structure were established, and the photon spectrum measured by detectors from outside the structure was simulated using the FLUKA Monte-Carlo code. Through training with spectrum data corresponding to various radioactivity distributions, an evaluation model for radioactivity using simulated data was developed by above Monte-Carlo simulation. Convolutional Neural Network and Transformer methods were utilized to establish the evaluation model. The machine learning construction involves 5425 simulation datasets, and 603 datasets, which were used to obtain the evaluated results. Preprocessing was applied to the datasets, but the evaluation model using raw spectrum data showed the best evaluation results. The estimation of the intensity and shape of the radioactivity distribution inside the structure was achieved with a relative error of 10%. Additionally, the evaluation based on the constructed model takes only a few seconds to complete the process.

해외 도서관 링크드 데이터 구축의 최근 동향 연구 - 발행 데이터세트, 재사용 어휘집, 인터링킹 외부 데이터세트를 중심으로 - (A Study on Recent Trends in Building Linked Data for Overseas Libraries: Focusing on Published Datasets, Reused Vocabulary, and Interlinked External Datasets)

  • 이성숙
    • 한국문헌정보학회지
    • /
    • 제56권4호
    • /
    • pp.5-28
    • /
    • 2022
  • 이 연구에서는 해외 도서관의 LD 구축 사례를 발행 데이터세트, 재사용 어휘집, 인터링킹 외부 데이터세트를 중심으로 분석하고, 분석 결과를 토대로 국내도서관의 LD 구축 방안에 대한 기초적인 데이터를 확보하였다. 21개 해외 도서관 사례 분석 결과, 해외 도서관은 충실한 전거 LD를 구축하였고, 발행 LD를 활용한 새로운 서비스를 진행하였다. 이를 위해 해외 도서관은 도서관의 주도하에 다른 도서관과 문화기관들과 지역 내에서, 국가 내에서, 국가적으로 협력하였고, 이러한 협력을 바탕으로 특성화된 데이터세트를 발행하였다. 해외 도서관은 발행 LD의 가시성을 높이기 위해 Schema.org를 사용하였고, 기술의 세분화를 위해 BIBFRAME 등을 사용하여 다양한 개체를 정의하고, 정의된 개체에 기반하여 LD를 구축하였다. 해외 도서관은 다양한 개체를 관련 정보 연계와 검색 결과 디스플레이, 브라우징, 대량 다운로드에 활용하였다. 해외 도서관은 인터링킹 외부 데이터세트를 지속해서 현행화하였고, 외부 데이터를 직접적으로 활용하여 목록정보를 보강하였다. 이 연구에서는 도출된 시사점을 토대로 국내도서관의 LD 구축 시 고려할 점을 제안하였다. 연구 결과는 향후 국내도서관이 LD 서비스를 계획하거나 기존 서비스를 고도화할 때 기초자료로 활용될 수 있을 것이다.

비대칭 마진 SVM 최적화 모델을 이용한 기업부실 예측모형의 범주 불균형 문제 해결 (Optimization of Uneven Margin SVM to Solve Class Imbalance in Bankruptcy Prediction)

  • 조성임;김명종
    • 경영정보학연구
    • /
    • 제24권4호
    • /
    • pp.23-40
    • /
    • 2022
  • Support Vector Machine(SVM)은 기업부실 예측문제 등 다양한 분야에서 성공적으로 활용되어 왔으나 범주 불균형 문제가 존재하는 경우 다수 범주의 경계영역은 확장되는 반면, 소수 범주의 경계영역은 축소되고 분류 경계선이 소수 범주로 편향되어 분류 성과에 부정적인 영향을 미치는 것으로 보고되고 있다. 본 연구는 범주 불균형 문제에 대한 대칭 마진 SVM(EMSVM)의 한계점을 개선하기 위하여 비대칭 마진 SVM(UMSVM)과 임계점 이동 기법을 결합한 최적화 비대칭 마진 SVM인 OPT-UMSVM을 제안한다. OPT-UMSVM은 소수 범주 방향으로 치우진 분류 경계선을 다수 범주로 재이동함으로써 소수 범주의 민감도를 개선하고 최적화된 분류 성과를 산출함으로써 SVM의 일반화 능력을 향상시키는 장점을 가진다. OPT-UMSVM의 성과 개선 효과를 검증하기 위하여 불균형 비율이 상이한 5개의 표본군을 구성하여 10-fold 교차타당성 검증을 수행한 결과는 다음과 같다. 첫째, 범주 불균형이 미미한 표본에서 UMSVM은 EMSVM의 성과 개선 효과가 미약한 반면, 범주 불균형이 심화된 표본에서 UMSVM은 EMSVM의 성과개선에 크게 공헌하고 있다. 둘째, OPT-UMSVM은 EMSVM 및 기존의 UMSVM과 비교하여 범주 균형 및 범주 불균형 표본 모두에서 보다 우수한 성과를 가지고 있으며, 특히 범주 불균형이 심화된 표본에서 유의적인 성과 차이를 보였다.

Integration and Reanalysis of Four RNA-Seq Datasets Including BALF, Nasopharyngeal Swabs, Lung Biopsy, and Mouse Models Reveals Common Immune Features of COVID-19

  • Rudi Alberts;Sze Chun Chan;Qian-Fang Meng;Shan He;Lang Rao;Xindong Liu;Yongliang Zhang
    • IMMUNE NETWORK
    • /
    • 제22권3호
    • /
    • pp.22.1-22.25
    • /
    • 2022
  • Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndromecoronavirus-2 (SARS-CoV-2), has spread over the world causing a pandemic which is still ongoing since its emergence in late 2019. A great amount of effort has been devoted to understanding the pathogenesis of COVID-19 with the hope of developing better therapeutic strategies. Transcriptome analysis using technologies such as RNA sequencing became a commonly used approach in study of host immune responses to SARS-CoV-2. Although substantial amount of information can be gathered from transcriptome analysis, different analysis tools used in these studies may lead to conclusions that differ dramatically from each other. Here, we re-analyzed four RNA-sequencing datasets of COVID-19 samples including human bronchoalveolar lavage fluid, nasopharyngeal swabs, lung biopsy and hACE2 transgenic mice using the same standardized method. The results showed that common features of COVID-19 include upregulation of chemokines including CCL2, CXCL1, and CXCL10, inflammatory cytokine IL-1β and alarmin S100A8/S100A9, which are associated with dysregulated innate immunity marked by abundant neutrophil and mast cell accumulation. Downregulation of chemokine receptor genes that are associated with impaired adaptive immunity such as lymphopenia is another common feather of COVID-19 observed. In addition, a few interferon-stimulated genes but no type I IFN genes were identified to be enriched in COVID-19 samples compared to their respective control in these datasets. These features are in line with results from single-cell RNA sequencing studies in the field. Therefore, our re-analysis of the RNA-seq datasets revealed common features of dysregulated immune responses to SARS-CoV-2 and shed light to the pathogenesis of COVID-19.