• Title/Summary/Keyword: Bayes test

Search Result 110, Processing Time 0.022 seconds

Motion Recognition for Kinect Sensor Data Using Machine Learning Algorithm with PNF Patterns of Upper Extremities

  • Kim, Sangbin;Kim, Giwon;Kim, Junesun
    • The Journal of Korean Physical Therapy
    • /
    • v.27 no.4
    • /
    • pp.214-220
    • /
    • 2015
  • Purpose: The purpose of this study was to investigate the availability of software for rehabilitation with the Kinect sensor by presenting an efficient algorithm based on machine learning when classifying the motion data of the PNF pattern if the subjects were wearing a patient gown. Methods: The motion data of the PNF pattern for upper extremities were collected by Kinect sensor. The data were obtained from 8 normal university students without the limitation of upper extremities. The subjects, wearing a T-shirt, performed the PNF patterns, D1 and D2 flexion, extensions, 30 times; the same protocol was repeated while wearing a patient gown to compare the classification performance of algorithms. For comparison of performance, we chose four algorithms, Naive Bayes Classifier, C4.5, Multilayer Perceptron, and Hidden Markov Model. The motion data for wearing a T-shirt were used for the training set, and 10 fold cross-validation test was performed. The motion data for wearing a gown were used for the test set. Results: The results showed that all of the algorithms performed well with 10 fold cross-validation test. However, when classifying the data with a hospital gown, Hidden Markov model (HMM) was the best algorithm for classifying the motion of PNF. Conclusion: We showed that HMM is the most efficient algorithm that could handle the sequence data related to time. Thus, we suggested that the algorithm which considered the sequence of motion, such as HMM, would be selected when developing software for rehabilitation which required determining the correctness of the motion.

A Study on automatic assignment of descriptors using machine learning (기계학습을 통한 디스크립터 자동부여에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.279-299
    • /
    • 2006
  • This study utilizes various approaches of machine learning in the process of automatically assigning descriptors to journal articles. The effectiveness of feature selection and the size of training set were examined, after selecting core journals in the field of information science and organizing test collection from the articles of the past 11 years. Regarding feature selection, after reducing the feature set using $x^2$ statistics(CHI) and criteria that prefer high-frequency features(COS, GSS, JAC), the trained Support Vector Machines(SVM) performed the best. With respect to the size of the training set, it significantly influenced the performance of Support Vector Machines(SVM) and Voted Perceptron(VTP). However, it had little effect on Naive Bayes(NB).

Comparative Application of Various Machine Learning Techniques for Lithology Predictions (다양한 기계학습 기법의 암상예측 적용성 비교 분석)

  • Jeong, Jina;Park, Eungyu
    • Journal of Soil and Groundwater Environment
    • /
    • v.21 no.3
    • /
    • pp.21-34
    • /
    • 2016
  • In the present study, we applied various machine learning techniques comparatively for prediction of subsurface structures based on multiple secondary information (i.e., well-logging data). The machine learning techniques employed in this study are Naive Bayes classification (NB), artificial neural network (ANN), support vector machine (SVM) and logistic regression classification (LR). As an alternative model, conventional hidden Markov model (HMM) and modified hidden Markov model (mHMM) are used where additional information of transition probability between primary properties is incorporated in the predictions. In the comparisons, 16 boreholes consisted with four different materials are synthesized, which show directional non-stationarity in upward and downward directions. Futhermore, two types of the secondary information that is statistically related to each material are generated. From the comparative analysis with various case studies, the accuracies of the techniques become degenerated with inclusion of additive errors and small amount of the training data. For HMM predictions, the conventional HMM shows the similar accuracies with the models that does not relies on transition probability. However, the mHMM consistently shows the highest prediction accuracy among the test cases, which can be attributed to the consideration of geological nature in the training of the model.

Outlier Detection in Growth Curve Model Using Mean-Shift Model (평균이동모형을 이용한 성장곡선모형의 이상점 진단에 관한 연구)

  • Shim, Kyu-Bark
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.369-385
    • /
    • 1999
  • For the growth curve model with arbitrary covariance structure, known as unstructured covariance matrix, the problems of detecting outliers are discussed in this paper. In order to detect outliers in the growth curve model, the likelihood ratio testing statistics in mean shift model is established and its distribution is derived. After we detected outliers in growth curve model, we test homo and/or hetero-geneous covariance matrices using PSR Quasi-Bayes Criterion. For illustration, one numerical example is discussed, which compares between before and after outlier deleting.

  • PDF

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

A Report on the Inter-Gene Correlations in cDNA Microarray Data Sets (cDNA 마이크로어레이에서 유전자간 상관 관계에 대한 보고)

  • Kim, Byung-Soo;Jang, Jee-Sun;Kim, Sang-Cheol;Lim, Jo-Han
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.617-626
    • /
    • 2009
  • A series of recent papers reported that the inter-gene correlations in Affymetrix microarray data sets were strong and long-ranged, and the assumption of independence or weak dependence among gene expression signals which was often employed without justification was in conflict with actual data. Qui et al. (2005) indicated that applying the nonparametric empirical Bayes method in which test statistics were pooled across genes for performing the statistical inference resulted in the large variance of the number of differentially expressed genes. Qui et al. (2005) attributed this effect to strong and long-ranged inter-gene correlations. Klebanov and Yakovlev (2007) demonstrated that the inter-gene correlations provided a rich source of information rather than being a nuisance in the statistical analysis and they developed, by transforming the original gene expression sequence, a sequence of independent random variables which they referred to as a ${\delta}$-sequence. We note in this report using two cDNA microarray data sets experimented in this country that the strong and long-ranged inter-gene correlations were still valid in cDNA microarray data and also the ${\delta}$-sequence of independence could be derived from the cDNA microarray data. This note suggests that the inter-gene correlations be considered in the future analysis of the cDNA microarray data sets.

A Study on Fault Classification by EEMD Application of Gear Transmission Error (전달오차의 EEMD적용을 통한 기어 결함분류연구)

  • Park, Sungho;Choi, Joo-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.169-177
    • /
    • 2017
  • In this paper, classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition(EEMD) for the gear transmission error(TE). Finite element models of the gears with the two faults are built, and TE is obtained by simulation of the gears under loaded contact. EEMD is applied to the residuals of the TE which are the difference between the normal and faulty signal. From the result, the difference of spall and crack faults are clearly identified by the intrinsic mode functions(IMF). A simple test bed is installed to illustrate the approach, which consists of motor, brake and a pair of spur gears. Two gears are employed to obtain the TE for the normal, spalled, and cracked gears, and the type of the faults are separated by the same EEMD application process. In order to quantify the results, crest factors are applied to each IMF. Characteristics of spall and crack are well represented by the crest factors of the first and the third IMF, which are used as the feature signals. The classification is carried out using the Bayes decision theory using the feature signals acquired through the experiments.

A Performance Comparison of Machine Learning Classification Methods for Soil Creep Susceptibility Assessment (땅밀림 위험지 평가를 위한 기계학습 분류모델 비교)

  • Lee, Jeman;Seo, Jung Il;Lee, Jin-Ho;Im, Sangjun
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.610-621
    • /
    • 2021
  • The soil creep, primarily caused by earthquakes and torrential rainfall events, has widely occurred across the country. The Korea Forest Service attempted to quantify the soil creep susceptible areas using a discriminant value table to prevent or mitigate casualties and/or property damages in advance. With the advent of advanced computer technologies, machine learning-based classification models have been employed for managing mountainous disasters, such as landslides and debris flows. This study aims to quantify the soil creep susceptibility using several classifiers, namely the k-Nearest Neighbor (k-NN), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) models. To develop the classification models, we downscaled 292 data from 4,618 field survey data. About 70% of the selected data were used for training, with the remaining 30% used for model testing. The developed models have the classification accuracy of 0.727 for k-NN, 0.750 for NB, 0.807 for RF, and 0.750 for SVM against test datasets representing 30% of the total data. Furthermore, we estimated Cohen's Kappa index as 0.534, 0.580, 0.673, and 0.585, with AUC values of 0.872, 0.912, 0.943, and 0.834, respectively. The machine learning-based classifications for soil creep susceptibility were RF, NB, SVM, and k-NN in that order. Our findings indicate that the machine learning classifiers can provide valuable information in establishing and implementing natural disaster management plans in mountainous areas.

Effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment

  • Kim, Byung-Soo;Rha, Sun-Young
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.67-72
    • /
    • 2006
  • The aim of this paper is to discuss the effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment in the context of a one sample problem. We conducted a cDNA micro array experiment to detect differentially expressed genes for the metastasis of colorectal cancer based on twenty patients who underwent liver resection due to liver metastasis from colorectal cancer. Total RNAs from metastatic liver tumor and adjacent normal liver tissue from a single patient were labeled with cy5 and cy3, respectively, and competitively hybridized to a cDNA microarray with 7775 human genes. We used $M=log_2(R/G)$ for the signal evaluation, where Rand G denoted the fluorescent intensities of Cy5 and Cy3 dyes, respectively. The statistical problem comprises a one sample test of testing E(M)=0 for each gene and involves multiple tests. The twenty cDNA microarray data would comprise a matrix of dimension 7775 by 20, if there were no missing values. However, missing values occur for various reasons. For each gene, the no missing proportion (NMP) was defined to be the proportion of non-missing values out of twenty. In detecting differentially expressed (DE) genes, we used the genes whose NMP is greater than or equal to 0.4 and then sequentially increased NMP by 0.1 for investigating its effect on the detection of DE genes. For each fixed NMP, we imputed the missing values with K-nearest neighbor method (K=10) and applied the nonparametric t-test of Dudoit et al. (2002), SAM by Tusher et al. (2001) and empirical Bayes procedure by $L\ddot{o}nnstedt$ and Speed (2002) to find out the effect of missing values in the final outcome. These three procedures yielded substantially agreeable result in detecting DE genes. Of these three procedures we used SAM for exploring the acceptable NMP level. The result showed that the optimum no missing proportion (NMP) found in this data set turned out to be 80%. It is more desirable to find the optimum level of NMP for each data set by applying the method described in this note, when the plot of (NMP, Number of overlapping genes) shows a turning point.

  • PDF