• Title/Summary/Keyword: multiple linear analysis

Search Result 1,509, Processing Time 0.026 seconds

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Quantitative Analysis by Derivative Spectrophotometry (III) -Simultaneous quantitation of vitamin B group and vitamin C in by multiple linear regression analysis-

  • Park, Man-Ki;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.1
    • /
    • pp.45-51
    • /
    • 1988
  • The feature of resolution enhancement by derivative operation is linked to one of the multivariate analysis, which is multiple linear regression with two options, all possible and stepwise regression. Examined samples were synthetic mixtures of 5 vitamins, thiamine mononitrate, riboflavin phosphate, nicotinamide, pyridoxine hydrochloride and ascorbic acid. All components in mixture were quantified with reasonably good accuracy and precision. Whole data processing procedure was accomplished on-line by the development of three computer programs written in APPLESOFT BASIC language.

  • PDF

Forecasting of Seasonal Inflow to Reservoir Using Multiple Linear Regression (다중선형회귀분석에 의한 계절별 저수지 유입량 예측)

  • Kang, Jaewon
    • Journal of Environmental Science International
    • /
    • v.22 no.8
    • /
    • pp.953-963
    • /
    • 2013
  • Reliable long-term streamflow forecasting is invaluable for water resource planning and management which allocates water supply according to the demand of water users. Forecasting of seasonal inflow to Andong dam is performed and assessed using statistical methods based on hydrometeorological data. Predictors which is used to forecast seasonal inflow to Andong dam are selected from southern oscillation index, sea surface temperature, and 500 hPa geopotential height data in northern hemisphere. Predictors are selected by the following procedure. Primary predictors sets are obtained, and then final predictors are determined from the sets. The primary predictor sets for each season are identified using cross correlation and mutual information. The final predictors are identified using partial cross correlation and partial mutual information. In each season, there are three selected predictors. The values are determined using bootstrapping technique considering a specific significance level for predictor selection. Seasonal inflow forecasting is performed by multiple linear regression analysis using the selected predictors for each season, and the results of forecast using cross validation are assessed. Multiple linear regression analysis is performed using SAS. The results of multiple linear regression analysis are assessed by mean squared error and mean absolute error. And contingency table is established and assessed by Heidke skill score. The assessment reveals that the forecasts by multiple linear regression analysis are better than the reference forecasts.

Quantitative Analysis by Diffuse Reflectance Infrared Fourier Transform and Linear Stepwise Multiple Regression Analysis I -Simultaneous quantitation of ethenzamide, isopropylantipyrine, caffeine, and allylisopropylacetylurea in tablet by DRIFT and linear stepwise multiple regression analysis-

  • Park, Man-Ki;Yoon, Hye-Ran;Kim, Kyoung-Ho;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.2
    • /
    • pp.99-113
    • /
    • 1988
  • Quantitation of ethenzamide, isopropylantipyrine and caffeine takes about 41 hrs by conventional GC method. Quantitation of allylisoprorylacetylurea takes about 40 hrs by conventional UV method. But quantitation of them takes about 6 hrs by DRIFT developing method. Each standard and sample sieved, powdered and acquired DRIFT spectrum. Out of them peak of each component was selected and ratio of each peak to standard peak was acquired, and then linear stepwise multiple regression was performed with these data and concentration. Reflectance value, Kubelka-Munk equation and Inverse-Kubelka-Munk equation were modified by us. Inverse-Kubelka-Munk equation completed the deficit of Kubelka-Munk equation. Correlation coefficients acquired by conventioanl GC and UV against DRIFT were more than 0.95.

  • PDF

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Traffic Accident Models of 3-Legged Signalized Intersections in the Case of Cheongju (3지 신호교차로의 교통사고 발생모형 - 청주시를 사례로 -)

  • Park, Byung-Ho;Han, Sang-Uk;Kim, Tae-Young
    • Journal of the Korean Society of Safety
    • /
    • v.24 no.2
    • /
    • pp.94-99
    • /
    • 2009
  • This study deals with the traffic accidents at the 3-legged signalized intersections in Cheongu. The goals are to analyze the geometric, traffic and operational conditions of intersections and to develop a various functional forms that predict the accidents. The models are developed through the correlation analysis, the multiple linear, the multiple nonlinear, Poisson and negative binomial regression analysis. In this study, two multiple linear, two multiple nonlinear and two negative binomial regression models were calibrated. These models were all analyzed to be statistically significant. All the models include 2 common variables(traffic volume and lane width) and model-specific variables. These variables are, therefore, evaluated to be critical to the accident reduction of Cheongju.

Study on the Critical Storm Duration Decision of the Rivers Basin (중소하천유역의 임계지속시간 결정에 관한 연구)

  • Ahn, Seung-Seop;Lee, Hyeo-Jung;Jung, Do-June
    • Journal of Environmental Science International
    • /
    • v.16 no.11
    • /
    • pp.1301-1312
    • /
    • 2007
  • The objective of this study is to propose a critical storm duration forecasting model on storm runoff in small river basin. The critical storm duration data of 582 sub-basin which introduced disaster impact assessment report on the National Emergency Management Agency during the period from 2004 to 2007 were collected, analyzed and studied. The stepwise multiple regression method are used to establish critical storm duration forecasting models(Linear and exponential type). The results of multiple regression analysis discriminated the linear type more than exponential type. The results of multiple linear regression analysis between the critical storm duration and 5 basin characteristics parameters such as basin area, main stream length, average slope of main stream, shape factor and CN showed more than 0.75 of correlation in terms of the multi correlation coefficient.

New Methodology to Develop Multi-parametric Measure of Heart Rate Variability Diagnosing Cardiovascular Disease

  • Jin, Seung-Hyun;Kim, Wuon-Shik;Park, Yong-Ki
    • International Journal of Vascular Biomedical Engineering
    • /
    • v.3 no.2
    • /
    • pp.17-24
    • /
    • 2005
  • The main purpose of our study is to propose a new methodology to develop the multi-parametric measure including linear and nonlinear measures of heart rate variability diagnosing cardiovascular disease. We recorded electrocardiogram for three recumbent postures; the supine, left lateral, and right lateral postures. Twenty control subjects (age: $56.70{\pm}9.23$ years), 51 patients with angina pectoris (age: $59.98{\pm}8.41$ years) and 13 patients with acute coronary syndrome (age: $59.08{\pm}9.86$ years) participated in this study. To develop the multi-parametric measure of HRV, we used the multiple discriminant analysis method among statistical techniques. As a result, the multiple discriminant analysis gave 75.0% of goodness of fit. When the linear and nonlinear measures of HRV are individually used as a clinical tool to diagnose cardiac autonomic function, there is quite a possibility that the wrong results will be obtained due to each measure has different characteristics. Although our study is a preliminary one, we suggest that the multi-parametric measure, which takes into consideration the whole possible linear and nonlinear measures of HRV, may be helpful to diagnose the cardiovascular disease as a diagnostic supplementary tool.

  • PDF

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

Analysis of Acceleration Bounds and Mobility for Multiple Robot Systems Based on Null Space Analysis Method (영 공간 분해 방법을 이용한 다중 협동로봇의 모빌리티와 가속도 조작성 해석)

  • Lee Fill-Youb;Jun Bong-Huan;Lee Ji-Hong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.5
    • /
    • pp.497-504
    • /
    • 2006
  • This paper presents a new technique that derives the dynamic acceleration bounds of multiple cooperating robot systems from given individual torque limits of robots. A set of linear algebraic homogeneous equation is derived from the dynamic equations of multiple robots with friction contacts. The mobility of the robot system is analyzed by the decomposition of the null space of the linear algebraic equation. The acceleration bounds of multiple robot systems are obtained from the joint torque constraints of robots by the medium of the decomposed null space. As the joint constraints of the robots are given in the infinite norm sense, the resultant acceleration bounds of the systems are described as polytopes. Several case studies are presented to validate the proposed method in this paper.