• Title/Summary/Keyword: Random sets

Search Result 276, Processing Time 0.032 seconds

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending (P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.71-78
    • /
    • 2019
  • This study aims to identify good borrowers within the context of P2P lending. P2P lending is a growing platform that allows individuals to lend and borrow money from each other. Inherent in any loans is credit risk of borrowers and needs to be considered before any lending. Specifically in the context of P2P lending, traditional models fall short and thus this study aimed to rectify this as well as explore the problem of class imbalances seen within credit risk data sets. This study implemented an over-sampling technique known as Synthetic Minority Over-sampling Technique (SMOTE). To test our approach, we implemented five benchmarking classifiers such as support vector machines, logistic regression, k-nearest neighbor, random forest, and deep neural network. The data sample used was retrieved from the publicly available LendingClub dataset. The proposed SMOTE revealed significantly improved results in comparison with the benchmarking classifiers. These results should help actors engaged within P2P lending to make better informed decisions when selecting potential borrowers eliminating the higher risks present in P2P lending.

Ecological Momentary Assessment Using Smartphone-Based Mobile Application for Affect and Stress Assessment

  • Yang, Yong Sook;Ryu, Gi Wook;Han, Insu;Oh, Seojin;Choi, Mona
    • Healthcare Informatics Research
    • /
    • v.24 no.4
    • /
    • pp.381-386
    • /
    • 2018
  • Objectives: This study aimed to describe the process of utilizing a mobile application for ecological momentary assessment (EMA) to collect data on stress and mood in daily life setting. Methods: A mobile application for the Android operating system was developed and installed with a set of questions regarding momentary mood and stress into a smartphone of a participant. The application sets alarms at semi-random intervals in 60-minute blocks, four times a day for 7 days. After obtaining all momentary affect and stress, the questions to assess the usability of the mobile EMA application were also administered. Results: The data were collected from 97 police officers working in Gyeonggi Province of South Korea. The mean completion rate was 60.0% ranging from 3.5% to 100%. The means of positive and negative affect were 18.34 of 28 and 19.09 of 63. The mean stress was 17.92 of 40. Participants responded that the mobile application correctly measured their affect ($4.34{\pm}0.83$) and stress ($4.48{\pm}0.62$) of 5-point Likert scale. Conclusions: Our study investigated the process of utilizing a mobile application to assess momentary affect and stress at repeated times. We found challenges regarding adherence to the research protocol, such as completion and delay of answering after alarm notification. Despite this inherent issue of adherence to the research protocol, the EMA still has advantages of reducing recall bias and assessing the actual moment of interest at multiple time points that improves ecological validity.

Pathway enrichment and protein interaction network analysis for milk yield, fat yield and age at first calving in a Thai multibreed dairy population

  • Laodim, Thawee;Elzo, Mauricio A.;Koonawootrittriron, Skorn;Suwanasopee, Thanathip;Jattawa, Danai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.4
    • /
    • pp.508-518
    • /
    • 2019
  • Objective: This research aimed to determine biological pathways and protein-protein interaction (PPI) networks for 305-d milk yield (MY), 305-d fat yield (FY), and age at first calving (AFC) in the Thai multibreed dairy population. Methods: Genotypic information contained 75,776 imputed and actual single nucleotide polymorphisms (SNP) from 2,661 animals. Single-step genomic best linear unbiased predictions were utilized to estimate SNP genetic variances for MY, FY, and AFC. Fixed effects included herd-year-season, breed regression and heterosis regression effects. Random effects were animal additive genetic and residual. Individual SNP explaining at least 0.001% of the genetic variance for each trait were used to identify nearby genes in the National Center for Biotechnology Information database. Pathway enrichment analysis was performed. The PPI of genes were identified and visualized of the PPI network. Results: Identified genes were involved in 16 enriched pathways related to MY, FY, and AFC. Most genes had two or more connections with other genes in the PPI network. Genes associated with MY, FY, and AFC based on the biological pathways and PPI were primarily involved in cellular processes. The percent of the genetic variance explained by genes in enriched pathways (303) was 2.63% for MY, 2.59% for FY, and 2.49% for AFC. Genes in the PPI network (265) explained 2.28% of the genetic variance for MY, 2.26% for FY, and 2.12% for AFC. Conclusion: These sets of SNP associated with genes in the set enriched pathways and the PPI network could be used as genomic selection targets in the Thai multibreed dairy population. This study should be continued both in this and other populations subject to a variety of environmental conditions because predicted SNP values will likely differ across populations subject to different environmental conditions and changes over time.

Evaluating Usefulness of Deep Learning Based Left Ventricle Segmentation in Cardiac Gated Blood Pool Scan (게이트심장혈액풀검사에서 딥러닝 기반 좌심실 영역 분할방법의 유용성 평가)

  • Oh, Joo-Young;Jeong, Eui-Hwan;Lee, Joo-Young;Park, Hoon-Hee
    • Journal of radiological science and technology
    • /
    • v.45 no.2
    • /
    • pp.151-158
    • /
    • 2022
  • The Cardiac Gated Blood Pool (GBP) scintigram, a nuclear medicine imaging, calculates the left ventricular Ejection Fraction (EF) by segmenting the left ventricle from the heart. However, in order to accurately segment the substructure of the heart, specialized knowledge of cardiac anatomy is required, and depending on the expert's processing, there may be a problem in which the left ventricular EF is calculated differently. In this study, using the DeepLabV3 architecture, GBP images were trained on 93 training data with a ResNet-50 backbone. Afterwards, the trained model was applied to 23 separate test sets of GBP to evaluate the reproducibility of the region of interest and left ventricular EF. Pixel accuracy, dice coefficient, and IoU for the region of interest were 99.32±0.20, 94.65±1.45, 89.89±2.62(%) at the diastolic phase, and 99.26±0.34, 90.16±4.19, and 82.33±6.69(%) at the systolic phase, respectively. Left ventricular EF was calculated to be an average of 60.37±7.32% in the ROI set by humans and 58.68±7.22% in the ROI set by the deep learning segmentation model. (p<0.05) The automated segmentation method using deep learning presented in this study similarly predicts the average human-set ROI and left ventricular EF when a random GBP image is an input. If the automatic segmentation method is developed and applied to the functional examination method that needs to set ROI in the field of cardiac scintigram in nuclear medicine in the future, it is expected to greatly contribute to improving the efficiency and accuracy of processing and analysis by nuclear medicine specialists.

Privacy-Preserving Traffic Volume Estimation by Leveraging Local Differential Privacy

  • Oh, Yang-Taek;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.12
    • /
    • pp.19-27
    • /
    • 2021
  • In this paper, we present a method for effectively predicting traffic volume based on vehicle location data that are collected by using LDP (Local Differential Privacy). The proposed solution in this paper consists of two phases: the process of collecting vehicle location data in a privacy-presering manner and the process of predicting traffic volume using the collected location data. In the first phase, the vehicle's location data is collected by using LDP to prevent privacy issues that may arise during the data collection process. LDP adds random noise to the original data when collecting data to prevent the data owner's sensitive information from being exposed to the outside. This allows the collection of vehicle location data, while preserving the driver's privacy. In the second phase, the traffic volume is predicted by applying deep learning techniques to the data collected in the first stage. Experimental results with real data sets demonstrate that the method proposed in this paper can effectively predict the traffic volume using the location data that are collected in a privacy-preserving manner.

Cloud Removal Using Gaussian Process Regression for Optical Image Reconstruction

  • Park, Soyeon;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.4
    • /
    • pp.327-341
    • /
    • 2022
  • Cloud removal is often required to construct time-series sets of optical images for environmental monitoring. In regression-based cloud removal, the selection of an appropriate regression model and the impact analysis of the input images significantly affect the prediction performance. This study evaluates the potential of Gaussian process (GP) regression for cloud removal and also analyzes the effects of cloud-free optical images and spectral bands on prediction performance. Unlike other machine learning-based regression models, GP regression provides uncertainty information and automatically optimizes hyperparameters. An experiment using Sentinel-2 multi-spectral images was conducted for cloud removal in the two agricultural regions. The prediction performance of GP regression was compared with that of random forest (RF) regression. Various combinations of input images and multi-spectral bands were considered for quantitative evaluations. The experimental results showed that using multi-temporal images with multi-spectral bands as inputs achieved the best prediction accuracy. Highly correlated adjacent multi-spectral bands and temporally correlated multi-temporal images resulted in an improved prediction accuracy. The prediction performance of GP regression was significantly improved in predicting the near-infrared band compared to that of RF regression. Estimating the distribution function of input data in GP regression could reflect the variations in the considered spectral band with a broader range. In particular, GP regression was superior to RF regression for reproducing structural patterns at both sites in terms of structural similarity. In addition, uncertainty information provided by GP regression showed a reasonable similarity to prediction errors for some sub-areas, indicating that uncertainty estimates may be used to measure the prediction result quality. These findings suggest that GP regression could be beneficial for cloud removal and optical image reconstruction. In addition, the impact analysis results of the input images provide guidelines for selecting optimal images for regression-based cloud removal.

ACA: Automatic search strategy for radioactive source

  • Jianwen Huo;Xulin Hu;Junling Wang;Li Hu
    • Nuclear Engineering and Technology
    • /
    • v.55 no.8
    • /
    • pp.3030-3038
    • /
    • 2023
  • Nowadays, mobile robots have been used to search for uncontrolled radioactive source in indoor environments to avoid radiation exposure for technicians. However, in the indoor environments, especially in the presence of obstacles, how to make the robots with limited sensing capabilities automatically search for the radioactive source remains a major challenge. Also, the source search efficiency of robots needs to be further improved to meet practical scenarios such as limited exploration time. This paper proposes an automatic source search strategy, abbreviated as ACA: the location of source is estimated by a convolutional neural network (CNN), and the path is planned by the A-star algorithm. First, the search area is represented as an occupancy grid map. Then, the radiation dose distribution of the radioactive source in the occupancy grid map is obtained by Monte Carlo (MC) method simulation, and multiple sets of radiation data are collected through the eight neighborhood self-avoiding random walk (ENSAW) algorithm as the radiation data set. Further, the radiation data set is fed into the designed CNN architecture to train the network model in advance. When the searcher enters the search area where the radioactive source exists, the location of source is estimated by the network model and the search path is planned by the A-star algorithm, and this process is iterated continuously until the searcher reaches the location of radioactive source. The experimental results show that the average number of radiometric measurements and the average number of moving steps of the ACA algorithm are only 2.1% and 33.2% of those of the gradient search (GS) algorithm in the indoor environment without obstacles. In the indoor environment shielded by concrete walls, the GS algorithm fails to search for the source, while the ACA algorithm successfully searches for the source with fewer moving steps and sparse radiometric data.

Statistical estimation of the epochs of observation for the 28 determinative stars in the Shi Shi Xing Jing and the table in Cheonsang Yeolcha Bunyajido (석씨성경과 천상열차분야지도의 이십팔수 수거성 관측 연도의 통계적 추정)

  • Ahn, Sang-Hyeon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.61.3-61.3
    • /
    • 2019
  • The epochs of observation for the 28 determinative stars in the Shi Shi Xing Jing and Cheonsang Yeolcha Bunyajido are estimated by using two fitting methods. The coordinate values in these tables were thought to be measured with meridian instruments, and so they have the axis-misalignment errors and random errors. We adopt a Fourier method, and also we devise a least square fitting method. We do bootstrap resamplings to estimate the variance of the epochs. As results, we find that both data sets were made during the 1st century BCE or the latter period of the Former Han dynasty. The sample mean of the epoch for the SSXJ data is earlier by about 15-20 years than that for the Cheonsang Yeolcha Bunyajido. However, their variances are so large that we cannot decide whether the Shi Shi Xing Jing data was formed around 77 BCE and the Cheonsang Yeolcha Bunyajido was measured in 52 BCE. We need either more data points or data points measured with better precision. We will discuss on the other 120 coordinates of stars listed in the Shi Shi Xing Jing.

  • PDF

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Utilization of Skewness for Statistical Quality Control (통계적 품질관리를 위한 왜도의 활용)

  • Kim, Hoontae;Lim, Sunguk
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.663-675
    • /
    • 2023
  • Purpose: Skewness is an indicator used to measure the asymmetry of data distribution. In the past, product quality was judged only by mean and variance, but in modern management and manufacturing environments, various factors and volatility must be considered. Therefore, skewness helps accurately understand the shape of data distribution and identify outliers or problems, and skewness can be utilized from this new perspective. Therefore, we would like to propose a statistical quality control method using skewness. Methods: In order to generate data with the same mean and variance but different skewness, data was generated using normal distribution and gamma distribution. Using Minitab 18, we created 20 sets of 1,000 random data of normal distribution and gamma distribution. Using this data, it was proven that the process state can be sensitively identified by using skewness. Results: As a result of the analysis of this study, if the skewness is within ± 0.2, there is no difference in judgment from management based on the probability of errors that can be made in the management state as discussed in quality control. However, if the skewness exceeds ±0.2, the control chart considering only the standard deviation determines that it is in control, but it can be seen that the data is out of control. Conclusion: By using skewness in process management, the ability to evaluate data quality is improved and the ability to detect abnormal signals is excellent. By using this, process improvement and process non-sub-stitutability issues can be quickly identified and improved.