• Title/Summary/Keyword: Linear regression models

Search Result 947, Processing Time 0.03 seconds

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

Clinical, Histopathological and Molecular Characteristics of Metastatic Breast Cancer in North-Eastern Kazakhstan: a 10 Year Retrospective Study

  • Abiltayeva, Aizhan;Moore, Malcolm A;Myssayev, Ayan;Adylkhanov, Tasbolat;Baissalbayeva, Ainur;Zhabagin, Kuantkan;Beysebayev, Eldar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.10
    • /
    • pp.4797-4802
    • /
    • 2016
  • Background: Breast cancer (BC) is the top cancer among women worldwide and has been the most frequent malignancy among Kazakhstan women over the past few decades. Information on clinical and histopathological features of metastatic breast cancer (MBC), as well as the distribution of molecular subtypes is limited for Kazakh people. Materials and Methods: The present observational retrospective study was carried out at Regional Oncologic Dispensaries in the North-East Region of Kazakhstan (in Semey and Pavlodar cities). Сlinical and histopathological data were obtained for a total of 570 MBC patients in the 10 year period from 2004-2013, for whom data on molecular subtype were available for 253. Data from hospital charts were entered into SPSS 20 for analysis by one-way ANOVA analysis of associations of different variables with 1-5 year survival. Pearson correlation and linear regression models were used to examine the relation between parameters with a p-value < 0.05 considered statistically significant. Results: No significant relationships were evident between molecular subtype and survival, site of metastases, stage or ethnicity. Young females below the age of 44 were slightly more likely to have triple negative lesions. While the ductal type greatly predomonated, luminal A and B cases had a higher percentage with lobular morphology. Conclusions: In this select group of metastatice brease cancer, no links were noted for survival with molecular subtype, in contrast to much of the literature.

A Study on Acute Effects of Fine Particles on Pulmonary Function of Schoolchildren in Beijing, China

  • Kim, Dae-Seon;Yu, Seung-Do;Cha, Jung-Hoon;Ahn, Seung-Chul
    • Proceedings of the Korean Environmental Health Society Conference
    • /
    • 2004.06a
    • /
    • pp.193-196
    • /
    • 2004
  • To evaluate the acute effects of fine particles on pulmonary function, a longitudinal study was conducted. This study was carried out for the schoolchildren (3rd and 6th grades) living in Beijing, China. Children were asked to record their daily levels of peak expiratory flow rate using portable peak flow meter (mini-Wright) for 40 days. The relationship between daily PEFR and fine particle levels was analyzed using a mixed linear regression models including gender, height, the presence of respiratory symptoms, and daily average temperature and relative humidity as extraneous variables. The total number of students participating in this longitudinal study was 87. Daily measured PEFR was in the range of $253{\sim}501L/min$. On the daily basis, a PEFR measured in the morning was shown to be lower than that measured in the evening (or afternoon). The daily mean concentrations of $PM_{10}$ and $PM_{2.5}$ over the study period were $180.2\;{\mu}g/m^3$ and $103.2\;{\mu}g/m^3$, respectively. The IQR (inter-quartile range) of $PM_{10}$ and $PM_{2.5}$ were $91.8\;{\mu}g/m^3$ and $58.0\;{\mu}g/m^3$. Daily mean PEFR was regressed with the 24-hour average $PM_{10}$ (or $PM_{2.5}$) levels, weather information such as air temperature and relative humidity, and individual characteristics including gender, height, and respiratory symptoms. The analysis showed that the increase of fine particle concentrations was negatively associated with the variability in PEFR. The IQR increments of $PM_{10}$ or $PM_{2.5}$ (at 1-day time lag) were also shown to be related with 1.54L/min (95% Confidence intervals -2.14, -0.94) and 1.56L/min (95% CI -2.16, -0.95) decline in PEFR.

  • PDF

The Variables Affecting the Fluctuation of Visitors and the Construction of Models of Demand Projection in National Park (국립공원 이용객의 변동요인과 수요예측 모형설정)

  • 정하광
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.19 no.2
    • /
    • pp.12-22
    • /
    • 1991
  • The purpose of this study is to identify demand and methods of projection, including to prove the variables affecting the fluctuation of visitors and to analyze the relationship between these variables in National Park. Statistical analysis method (Multiple Linear Regression Analysis, ANOVA, and Model diagnostics) was carried out by computer program SAS/pc. 13 variables (1. Total Population, 2. Per Capita PDI, 3. Employment Ratio of S.O.C. & others, 4. NO. of Passenger Car, 5. Length of Roads, 6. Leisure Expenditure of Farm Household, 7. Leisure Expenditure of Urban Household, 8. Price Index, 9. NO. of Bus, 10. Exchange on Dollars, 11. Export, 12. Import, and 13. Visitors in National Park) had been used to this study. The scope of time period is during the last 17 years (1970-1986). The results were as follows; 1) Participation depends only on the specific characteristics of the economic factors (Price Index and Leisure Expenditure of Urban Household). These factors are the importance factors directly affecting the participation of visitors. The statistical Model for projecting the visitors in National Parks is the function of "Visitors in National Parks (thousand)=14915+0.210311*Leisure Expenditure of Urband Household (won)-157.835619*Price Index(1985=100)" 2) The external factors affecting the participation depends upon the interelated features of availability and accessibility (NO. of Passenger Car, Length of Roads, and NO. of Bus) of recreation resources or sites, and the economic factors (Per Capita PDI, Export, and Import). These factors are the factors indirectly affecting the participation of visitors. 3) The participation depends on the specific characteristics of demographic factors (Total Population and Employment Ratio of S.O.C. & others). These factors are the factors indirectly affecting the participation of visitors. 4) The unexpected fluctuation of yearly visitors depends on oil shock or inflation (1971, 1973-1974, 1979-1980), promulgation of national emergency decrees (1971-1972, 1974-1975, 1979-1980), and national events (assassination of president Park's wife, Madame Yuk in 1974 and president Park I 1979).

  • PDF

The wage determinants of college graduates using Heckman's sample selection model (Heckman의 표본선택모형을 이용한 대졸자의 임금결정요인 분석)

  • Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1099-1107
    • /
    • 2017
  • In this study, we analyzed the determinants of wages of college graduates by using the data of "2014 Graduates Occupational Mobility Survey" conducted by Korea Employment Information Service. In general, wages contain two complex pieces of information about whether an individual is employed and the size of the wage. However, in many previous researches on wage determinants, sample selection bias tends to be generated by performing linear regression analysis using only information on wage size. We used the Heckman sample selection models for analysis to overcome this problem. The main results are summarized as follows. First, the validity of the Heckman's sample selection model is statistically significant. Male is significantly higher in both job probability and wage than female. As age increases and parents' income increases, both the probability of employment and the size of wages are higher. Finally, as the university satisfaction increases and the number of certifications acquired increased, both the probability of employment and the wage tends to increase.

The Association between Bone Density at Os Calcis and Body Composition in Healthy Children Aged 9-12 Years (9-12세 정상 아동에서 종골 골밀도와 체성분의 연관성)

  • Shin, Eun-Kyung;Kim, Ki-Suk;Kim, Hee-Young;Lee, In-Sook;Joung, Hyo-Jee;Cho, Sung-Il
    • Journal of Preventive Medicine and Public Health
    • /
    • v.37 no.1
    • /
    • pp.72-79
    • /
    • 2004
  • Objectives : This cross-sectional study aimed to quantify the relationship between the bone mineral density at the os calcis and the body mass composition in healthy children. Methods : The areal bone mineral density was measured at the os calcis with peripheral dual energy X-ray absorptiometry. The fat free mass, fat mass and percentage fat mass were measured using bioelectric impedance, in 237 Korean children, aged 9 to 12 years. The sexual maturity was determined by self assessment, using standardized series of the 5 Tanner stage drawings, accompanied by explanatory text. Results : From multiple linear regression models, adjusted for age, sexual maturity and height, the fat free mass was found to be the best predictor of the calcaneal bone mineral density in both sexes. About 15 and 20% variabilities were found in the calcaneal bone mineral densities of the boys and girls, respectively, which can be explained by the fat free mass. After weight adjustment, the percentage fat mass was negatively associated with the calcaneal bone mineral density in both sexes. Conclusions : The findings of this study suggest that the fat free mass, among the body compositions, is the major determinant of bone mineral density at the os calcis in Korean children aged 9 to 12 years. Obesity, defined as the percentage fat mass, is assumed to have a negative effect on the calcaneal bone density in children of the same weight.

Application of Near Infrared Spectroscopy for Nondestructive Evaluation of Nitrogen Content in Ginseng

  • Lin, Gou-lin;Sohn, Mi-Ryeong;Kim, Eun-Ok;Kwon, Young-Kil;Cho, Rae-Kwang
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1528-1528
    • /
    • 2001
  • Ginseng cultivated in different country or growing condition has generally different components such as saponin and protein, and it relates to efficacy and action. Protein content assumes by nitrogen content in ginseng radix. Nitrogen content could be determined by chemical analysis such as kjeldahl or extraction methods. However, these methods require long analysis time and result environmental pollution and sample damage. In this work we investigated possibility of non-destructive determination of nitrogen content in ginseng radix using near-infrared spectroscopy. Ginseng radix, root of Panax ginseng C. A. Meyer, was studied. Total 120 samples were used in this study and it was consisted of 6 sample sets, 4, 5 and 6-year-old Korea ginseng and 7, 8 and 9-year-old China ginseng, respectively. Each sample set has 20 sample. Nigrogen content was measured by electronic analysis. NIR reflectance spectra were collected over the 1100 to 2500 nm spectral region with a InfraAlyzer 500C (Bran+Luebbe, Germany) equipped with a halogen lapmp and PbS detector and data were collected every 2 nm data point intervals. The calibration models were carried out by multiple linear regression (MLR) and partial least squares (PLS) analysis using IDAS and SESAME software. Result of electronic analysis, Korean ginseng were different mean value in nitrogen content of China ginseng. Ginseng tend to generally decrease the nitrogen content according as cultivation year is over 6 years. The MLR calibration model with 8 wavelengths using IDAS software accurately predicted nitrogen contents with correlation coefficient (R) and standard error of prediction of 0.985 and 0.855%, respectively. In case of SESAME software, the MLR calibration with 9 wavelength was selected the best calibration, R and SEP were 0.972 and 0.596%, respectively. The PLSR calibration model result in 0.969 of R and 0.630 of RMSEP. This study shows the NIR spectroscopy could be applied to determine the nitrogen content in ginseng radix with high accuracy.

  • PDF

Evaluation Method for Improvement Efficiency of Indoor Air Quality in Residence (주택의 실내공기질 개선 평가 방법)

  • Yang, Won-Ho;Son, Bu-Soon;Yim, Sung-Kuk
    • Journal of Environmental Health Sciences
    • /
    • v.33 no.4
    • /
    • pp.255-263
    • /
    • 2007
  • Indoor air quality is the dominant contributor to total personal exposure because most people spend a majority of their time indoors. The purposes of this study were to evaluate the alternative method for improvement of indoor air quality in house after coating titanium dioxide ($TiO_2$) photocatalyst for interior part of the house using nitrogen dioxide ($NO_2$) multiple measurements. To evaluate the alternative method in indoor environment, daily indoor and outdoor $NO_2$ concentrations of an apartment and a detached house were daily measured for consecutive 21 days in winter and summer, respectively, Another daily 21 measurements were carried out after $TiO_2$ coating on wall paper of interior part in houses. All $NO_2$ concentrations were measured by passive filter badges. Indoor air quality models using mass balance are useful tool to quantify the relationship between indoor air pollution levels, ambient concentrations, and explanatory variables. Using a mass balance model and linear regression analysis, penetration factor (ventilation rate divided by sum of ventilation rate and decay rate) and source strength factor (emission rate divided by sum of ventilation rate and decay rate) were calculated. Subsequently, the decay constants were estimated. In this study. magnitude of improvement of indoor air quality could be evaluated by decay constant.

The Case-Control Study on Complete Blood Count as a Risk Factor of Stroke (뇌졸중 위험지표로서의 일반혈액검사 소견에 대한 환자;대조군 연구)

  • Lee, Hyon-Ui;Kang, Kyung-Won;Yu, Byeong-Chan;Bang, Ok-Sun;Baek, Kyung-Min;Seol, In-Chan;Kim, Yoon-Sik
    • The Journal of Internal Korean Medicine
    • /
    • v.28 no.4
    • /
    • pp.872-885
    • /
    • 2007
  • Objective : Stroke is one of the most common causes of death in Korea. This study was done to evaluate the association of complete blood count (CBC) with the risk of hemorrhagic stroke and ischemic stroke. Methods : In 217-case patients with ischemic stroke or hemorrhagic stroke and 146 healthy control subjects without stroke, hypertension, diabetes mellitus, hyperlipidemia, or ischemic heart disease and 160 controls without ischemic stroke or hemorrhagic stroke, we tested and compared white blood cell count (WBC), red blood cell count (RBC), hemoglobin (Hgb), hematocrit (Hct) and platelet. These data were statically analyzed by general linear models and binary logistic regression analysis to get each adjusted odds ratio. Results :The level of WBC was significantly higher in all cases. The level of RBC, Hct and Hgb was significantly lower in patients of ischemic stroke. The level of platelet was significantly higher in patients of ischemic stroke. Conclusion : These results suggest high WBC may be a risk factor of hemorrhagic stroke and ischemic stroke and low RBC, low Hct, low Hgb and high platelet may be risk factors of ischemic stroke in Koreans.

  • PDF

A simplified method for estimating the fundamental period of masonry infilled reinforced concrete frames

  • Jiang, Rui;Jiang, Liqiang;Hu, Yi;Ye, Jihong;Zhou, Lingyu
    • Structural Engineering and Mechanics
    • /
    • v.74 no.6
    • /
    • pp.821-832
    • /
    • 2020
  • The fundamental period is an important parameter for seismic design and seismic risk assessment of building structures. In this paper, a simplified theoretical method to predict the fundamental period of masonry infilled reinforced concrete (RC) frame is developed based on the basic theory of engineering mechanics. The different configurations of the RC frame as well as masonry walls were taken into account in the developed method. The fundamental period of the infilled structure is calculated according to the integration of the lateral stiffness of the RC frame and masonry walls along the height. A correction coefficient is considered to control the error for the period estimation, and it is determined according to the multiple linear regression analysis. The corrected formula is verified by shaking table tests on two masonry infilled RC frame models, and the errors between the estimated and test period are 2.3% and 23.2%. Finally, a probability-based method is proposed for the corrected formula, and it allows the structural engineers to select an appropriate fundamental period with a certain safety redundancy. The proposed method can be quickly and flexibly used for prediction, and it can be hand-calculated and easily understood. Thus it would be a good choice in determining the fundamental period of RC frames infilled with masonry wall structures in engineering practice instead of the existing methods.