• Title/Summary/Keyword: Statistical classification

Search Result 1,415, Processing Time 0.03 seconds

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

Application of Bayesian Statistical Analysis to Multisource Data Integration

  • Hong, Sa-Hyun;Moon, Wooil-M.
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.394-399
    • /
    • 2002
  • In this paper, Multisource data classification methods based on Bayesian formula are considered. For this decision fusion scheme, the individual data sources are handled separately by statistical classification algorithms and then Bayesian fusion method is applied to integrate from the available data sources. This method includes the combination of each expert decisions where the weights of the individual experts represent the reliability of the sources. The reliability measure used in the statistical approach is common to all pixels in previous work. In this experiment, the weight factors have been assigned to have different value for all pixels in order to improve the integrated classification accuracies. Although most implementations of Bayesian classification approaches assume fixed a priori probabilities, we have used adaptive a priori probabilities by iteratively calculating the local a priori probabilities so as to maximize the posteriori probabilities. The effectiveness of the proposed method is at first demonstrated on simulations with artificial and evaluated in terms of real-world data sets. As a result, we have shown that Bayesian statistical fusion scheme performs well on multispectral data classification.

  • PDF

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

One-dimensional CNN Model of Network Traffic Classification based on Transfer Learning

  • Lingyun Yang;Yuning Dong;Zaijian Wang;Feifei Gao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.420-437
    • /
    • 2024
  • There are some problems in network traffic classification (NTC), such as complicated statistical features and insufficient training samples, which may cause poor classification effect. A NTC architecture based on one-dimensional Convolutional Neural Network (CNN) and transfer learning is proposed to tackle these problems and improve the fine-grained classification performance. The key points of the proposed architecture include: (1) Model classification--by extracting normalized rate feature set from original data, plus existing statistical features to optimize the CNN NTC model. (2) To apply transfer learning in the classification to improve NTC performance. We collect two typical network flows data from Youku and YouTube, and verify the proposed method through extensive experiments. The results show that compared with existing methods, our method could improve the classification accuracy by around 3-5%for Youku, and by about 7 to 27% for YouTube.

A Study on Statistical Classification of Wear Debris Morphology

  • Cho, Unchung
    • KSTLE International Journal
    • /
    • v.2 no.1
    • /
    • pp.35-39
    • /
    • 2001
  • In this paper, statistical approach is undertaken to investigate the classification of wear debris which is the key function of objective assessment of wear debris morphology. Wear tests are run to produce various kinds of wear debris. The images of wear debris from wear tests are captured with image acquisition equipment. By thresholding, two-dimensional binary images of wear debris are made and, then, morphological parameters are used to quantify the images of debris. Parametric and nonparametric discriminant method are employed to classify wear debris into predefined wear conditions. It is demonstrated that classification accuracy of parametric and nonparametric discriminant method is similar. The selected use of morphological parameters by stepwise discriminant analysis can generally improve the classification accuracy of parametric and nonparametric discriminant method.

  • PDF

Prediction of extreme PM2.5 concentrations via extreme quantile regression

  • Lee, SangHyuk;Park, Seoncheol;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.319-331
    • /
    • 2022
  • In this paper, we develop a new statistical model to forecast the PM2.5 level in Seoul, South Korea. The proposed model is based on the extreme quantile regression model with lasso penalty. Various meteorological variables and air pollution variables are considered as predictors in the regression model, and the lasso quantile regression performs variable selection and solves the multicollinearity problem. The final prediction model is obtained by combining various extreme lasso quantile regression estimators and we construct a binary classifier based on the model. Prediction performance is evaluated through the statistical measures of the performance of a binary classification test. We observe that the proposed method works better compared to the other classification methods, and predicts 'very bad' cases of the PM2.5 level well.

A New Approach to Statistical Analysis of Electrical Fire and Classification of Electrical Fire Causes

  • Kim, Doo-Hyun;Lee, Jong-Ho;Kim, Sung-Chul
    • International Journal of Safety
    • /
    • v.6 no.2
    • /
    • pp.17-21
    • /
    • 2007
  • This paper aims at the statistical analysis of electrical fire and classification of electrical fire causes to collect electrical fires data efficiently. Electrical fire statistics are produced to monitor the number and characteristics of fires attended by fire fighters, including the causes and effects of fire so that action can be taken to reduce the human and financial cost of fire. Electrical fires make up the majority of fires in Korea(including nearly 30% of total fires according to recent figures), The incorrect and biased knowledge for electrical fires changed the classification of certain types of fires, from non-electrical to electrical. It is convenient and required to develop the standardized form that makes, in the assessment of the cause of electrical fires, the fire fighters directly ticking the appropriate box on the fire report form or making an assessment of a text description. Therefore, it is highly recommended to develop electrical fire cause classification and electrical fire assessment on the fire statistics in order to categorize and assess electrical fires exactly. In this paper newly developed electrical fire cause classification structure, which is well-defined hierarchical structure so that there are not any relationship or overlap between cause categories, is suggested. Also fire statistics systems of foreign countries are introduced and compared.

Optimal bandwidth in nonparametric classification between two univariate densities

  • Hall, Peter;Kang, Kee-Hoon
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.1-5
    • /
    • 2002
  • We consider the problem of optimal bandwidth choice for nonparametric classification, based on kernel density estimators, where the problem of interest is distinguishing between two univariate distributions. When the densities intersect at a single point, optimal bandwidth choice depends on curvatures of the densities at that point. The problem of empirical bandwidth selection and classifying data in the tails of a distribution are also addressed.

  • PDF

Classification of Microarray Gene Expression Data by MultiBlock Dimension Reduction

  • Oh, Mi-Ra;Kim, Seo-Young;Kim, Kyung-Sook;Baek, Jang-Sun;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.567-576
    • /
    • 2006
  • In this paper, we applied the multiblock dimension reduction methods to the classification of tumor based on microarray gene expressions data. This procedure involves clustering selected genes, multiblock dimension reduction and classification using linear discrimination analysis and quadratic discrimination analysis.

On EM Algorithm For Discrete Classification With Bahadur Model: Unknown Prior Case

  • Kim, Hea-Jung;Jung, Hun-Jo
    • Journal of the Korean Statistical Society
    • /
    • v.23 no.1
    • /
    • pp.63-78
    • /
    • 1994
  • For discrimination with binary variables, reformulated full and first order Bahadur model with incomplete observations are presented. This allows prior probabilities associated with multiple population to be estimated for the sample-based classification rule. The EM algorithm is adopted to provided the maximum likelihood estimates of the parameters of interest. Some experiences with the models are evaluated and discussed.

  • PDF