• Title/Summary/Keyword: Random regression

Search Result 963, Processing Time 0.027 seconds

Likelihood-Based Inference of Random Effects and Application in Logistic Regression (우도에 기반한 임의효과에 대한 추론과 로지스틱 회귀모형에서의 응용)

  • Kim, Gwangsu
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.269-279
    • /
    • 2015
  • This paper considers inferences of random effects. We show that the proposed confidence distribution (CD) performs well in logistic regression for random intercepts with small samples. Real data analyses are also done to identify the subject effects clearly.

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

COSMO-SkyMed 2 Image Color Mapping Using Random Forest Regression

  • Seo, Dae Kyo;Kim, Yong Hyun;Eo, Yang Dam;Park, Wan Yong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.4
    • /
    • pp.319-326
    • /
    • 2017
  • SAR (Synthetic aperture radar) images are less affected by the weather compared to optical images and can be obtained at any time of the day. Therefore, SAR images are being actively utilized for military applications and natural disasters. However, because SAR data are in grayscale, it is difficult to perform visual analysis and to decipher details. In this study, we propose a color mapping method using RF (random forest) regression for enhancing the visual decipherability of SAR images. COSMO-SkyMed 2 and WorldView-3 images were obtained for the same area and RF regression was used to establish color configurations for performing color mapping. The results were compared with image fusion, a traditional color mapping method. The UIQI (universal image quality index), the SSIM (structural similarity) index, and CC (correlation coefficients) were used to evaluate the image quality. The color-mapped image based on the RF regression had a significantly higher quality than the images derived from the other methods. From the experimental result, the use of color mapping based on the RF regression for SAR images was confirmed.

ASYMPTOTIC NORMALITY OF WAVELET ESTIMATOR OF REGRESSION FUNCTION UNDER NA ASSUMPTIONS

  • Liang, Han-Ying;Qi, Yan-Yan
    • Bulletin of the Korean Mathematical Society
    • /
    • v.44 no.2
    • /
    • pp.247-257
    • /
    • 2007
  • Consider the heteroscedastic regression model $Y_i=g(x_i)+{\sigma}_i\;{\epsilon}_i=(1{\leq}i{\leq}n)$, where ${\sigma}^2_i=f(u_i)$, the design points $(x_i,\;u_i)$ are known and nonrandom, and g and f are unknown functions defined on closed interval [0, 1]. Under the random errors $\epsilon_i$ form a sequence of NA random variables, we study the asymptotic normality of wavelet estimators of g when f is a known or unknown function.

Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms (기계학습 알고리즘을 이용한 보행만족도 예측모형 개발)

  • Lee, Jae Seung;Lee, Hyunhee
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

The Effect of Highland Weather and Soil Information on the Prediction of Chinese Cabbage Weight (기상 및 토양정보가 고랭지배추 단수예측에 미치는 영향)

  • Kwon, Taeyong;Kim, Rae Yong;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.28 no.8
    • /
    • pp.701-707
    • /
    • 2019
  • Highland farming is agriculture that takes place 400 m above sea level and typically involves both low temperatures and long sunshine hours. Most highland Chinese cabbages are harvested in the Gangwon province. The Ubiquitous Sensor Network (USN) has been deployed to observe Chinese cabbages growth because of the lack of installed weather stations in the highlands. Five representative Chinese cabbage cultivation spots were selected for USN and meteorological data collection between 2015 and 2017. The purpose of this study is to develop a weight prediction model for Chinese cabbages using the meteorological and growth data that were collected one week prior. Both a regression and random forest model were considered for this study, with the regression assumptions being satisfied. The Root Mean Square Error (RMSE) was used to evaluate the predictive performance of the models. The variables influencing the weight of cabbage were the number of cabbage leaves, wind speed, precipitation and soil electrical conductivity in the regression model. In the random forest model, cabbage width, the number of cabbage leaves, soil temperature, precipitation, temperature, soil moisture at a depth of 30 cm, cabbage leaf width, soil electrical conductivity, humidity, and cabbage leaf length were screened. The RMSE of the random forest model was 265.478, a value that was relatively lower than that of the regression model (404.493); this is because the random forest model could explain nonlinearity.

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification (자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정)

  • Young-Nam Kim
    • 대한상한금궤의학회지
    • /
    • v.14 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

A Study on the Optimization of Metalloid Contents of Fe-Si-B-C Based Amorphous Soft Magnetic Materials Using Artificial Intelligence Method

  • Young-Sin Choi;Do-Hun Kwon;Min-Woo Lee;Eun-Ji Cha;Junhyup Jeon;Seok-Jae Lee;Jongryoul Kim;Hwi-Jun Kim
    • Archives of Metallurgy and Materials
    • /
    • v.67 no.4
    • /
    • pp.1459-1463
    • /
    • 2022
  • The soft magnetic properties of Fe-based amorphous alloys can be controlled by their compositions through alloy design. Experimental data on these alloys show some discrepancy, however, with predicted values. For further improvement of the soft magnetic properties, machine learning processes such as random forest regression, k-nearest neighbors regression and support vector regression can be helpful to optimize the composition. In this study, the random forest regression method was used to find the optimum compositions of Fe-Si-B-C alloys. As a result, the lowest coercivity was observed in Fe80.5Si3.63B13.54C2.33 at.% and the highest saturation magnetization was obtained Fe81.83Si3.63B12.63C1.91 at.% with R2 values of 0.74 and 0.878, respectively.