• Title/Summary/Keyword: Linear regression models

Search Result 939, Processing Time 0.024 seconds

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Biomass, Net Production and Nutrient Distribution of Bamboo Phyllostachys Stands in Korea (왕대속(屬) 대나무림(林)의 물질생산(物質生産) 및 무기영양물(無機營養物) 분배(分配)에 관한 연구(硏究))

  • Park, In Hyeop;Ryu, Suk Bong
    • Journal of Korean Society of Forest Science
    • /
    • v.85 no.3
    • /
    • pp.453-461
    • /
    • 1996
  • Three Phyllostachys stands of P. pubescens, P. bambusoides and P. nigra var, henonis in Sunchon were studied to investigate biomass, net production and nutrient distribution. Five $10m{\times}10m$ quadrats were set up and 20 sample culms of 2 years and over were harvested for dimension analysis in each stand. One year old culms and subterranean parts were estimated by the harvested quadrat method. The largest mean DBH, height and basal area were shown in P. pubescens stand, and followed by P. nigra var. henonis stand and P. bambusoides stand. There was little difference in accuracy among three allometric biomass regression models of logWt=A+B1ogD, $logWt=A+BlogD^2H$ and logWt=A+BlogD+ClogH, where Wt, D and H were dry weight, DBH and height, respectively. Analysis of covariance showed that there were significant differences in intercept among the linear allometric biomass regressons of three Phyllostachys species. Biomass included subterranean parts was the largest in P. pubescens stand(103.621t/ha), and followed by P. nigra var. henonis stand(86.447t/ha) and P. bambusoides stand(36.767t/ha). Leaf biomass was 6.3% to 7.8% of total biomass in each stands. The ratio of aboveground biomass and subterranean biomass in each stand was 1.87 to 2.26. Net production included subterranean parts was the greatest in P. pubescens stand(6.115t/ha/yr), and followed by P. nigra var. henonis stand(5.609t/ha/yr) and P, bambusoides stand(3.252t/ha/yr). The highest net assimilation ratio was estimated in P. pubescens stand(2.979), and followed by P. nigra var. henonis stand(2.752) and P. bambusoides stand(2.187). Biomass accumulation ratio of each stand was 2.679 to 5.358. Concentrations of N, P and Mg were the highest in leaves, and followed by subterranean parts, and culms+branches in all three species. Concentration of Ca was the highest in leaves, and followed by culms+branches, and subterranean parts in all three species. The difference in biomass among three species stands was caused by their culm size, leaf biomass, net assimilation ratio, and efficiency of leaves to produce culms.

  • PDF

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.

Changes in Radiation Use Efficiency of Rice Canopies under Different Nitrogen Nutrition Status (질소영양 상태에 따른 벼 군락의 광 이용효율 변화)

  • Lee Dong-Yun;Kim Min-Ho;Lee Kyu-Jong;Lee Byun-Woo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.3
    • /
    • pp.190-198
    • /
    • 2006
  • Radiation use efficiency (RUE), the amount of biomass produced per unit intercepted photosynthetically active radiation (PAR), constitutes a main part of crop growth simulation models. The objective of the present study was to evaluate the variation of RUE of rice plants under various nitrogen nutritive conditions. from 1998 to 2000, shoot dry weight (DW), intercepted PAR of rice canopies, and nitrogen nutritive status were measured in various nitrogen fertilization regimes using japonica and Tongil-type varieties. These data were used for estimating the average RUEs before heading and the relationship between RUE and the nitrogen nutritive status. The canopy extinction coefficient (K) increased with the growth of rice until maximum tillering stage and maintained constant at about 0.4 from maximum tillering to heading stage, rapidly increasing again after heading stage. The DW growth revealed significant linear correlation with the cumulative PAR interception of the canopy, enabling the estimation of the average RUE before heading with the slopes of the regression lines. Average RUE tended to increase with the increased level of nitrogen fertilization. RUE increased approaching maximum as the nitrogen nutrition index (NNI) calculated by the ratio of actual shoot N concentration to the critical N concentration for the maximum growth at any growth stage and the specific leaf nitrogen $(SLN;\;g/m^2\;leaf\;area)$ increased. This relationship between RUE (g/MJ of PAR) and N nutritive status was expressed well by the following exponential functions: $$RUE=3.13\{1-exp(-4.33NNNI+1.26)\}$$ $$RUE=3.17\{1-exp(-1.33SLN+0.04)\}$$ The above equations explained, respectively, about 80% and 75% of the average RUE variation due to varying nitrogen nutritive status of rice plants. However, these equations would have some limitations if incorporated as a component model to simulate the rice growth as they are based on relationships averaged over the entire growth period before heading.

The PRISM-based Rainfall Mapping at an Enhanced Grid Cell Resolution in Complex Terrain (복잡지형 고해상도 격자망에서의 PRISM 기반 강수추정법)

  • Chung, U-Ran;Yun, Kyung-Dahm;Cho, Kyung-Sook;Yi, Jae-Hyun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.11 no.2
    • /
    • pp.72-78
    • /
    • 2009
  • The demand for rainfall data in gridded digital formats has increased in recent years due to the close linkage between hydrological models and decision support systems using the geographic information system. One of the most widely used tools for digital rainfall mapping is the PRISM (parameter-elevation regressions on independent slopes model) which uses point data (rain gauge stations), a digital elevation model (DEM), and other spatial datasets to generate repeatable estimates of monthly and annual precipitation. In the PRISM, rain gauge stations are assigned with weights that account for other climatically important factors besides elevation, and aspects and the topographic exposure are simulated by dividing the terrain into topographic facets. The size of facet or grid cell resolution is determined by the density of rain gauge stations and a $5{\times}5km$ grid cell is considered as the lowest limit under the situation in Korea. The PRISM algorithms using a 270m DEM for South Korea were implemented in a script language environment (Python) and relevant weights for each 270m grid cell were derived from the monthly data from 432 official rain gauge stations. Weighted monthly precipitation data from at least 5 nearby stations for each grid cell were regressed to the elevation and the selected linear regression equations with the 270m DEM were used to generate a digital precipitation map of South Korea at 270m resolution. Among 1.25 million grid cells, precipitation estimates at 166 cells, where the measurements were made by the Korea Water Corporation rain gauge network, were extracted and the monthly estimation errors were evaluated. An average of 10% reduction in the root mean square error (RMSE) was found for any months with more than 100mm monthly precipitation compared to the RMSE associated with the original 5km PRISM estimates. This modified PRISM may be used for rainfall mapping in rainy season (May to September) at much higher spatial resolution than the original PRISM without losing the data accuracy.

Analysis of Hydrological Impact Using Climate Change Scenarios and the CA-Markov Technique on Soyanggang-dam Watershed (CA-Markov 기법을 이용한 기후변화에 따른 소양강댐 유역의 수문분석)

  • Lim, Hyuk-Jin;Kwon, Hyung-Joong;Bae, Deg-Hyo;Kim, Seong-Joon
    • Journal of Korea Water Resources Association
    • /
    • v.39 no.5 s.166
    • /
    • pp.453-466
    • /
    • 2006
  • The objective of this study was to analyze the changes in the hydrological environment in Soyanggang-dam watershed due to climate change results (in yews 2050 and 2100) which were simulated using CCCma CGCM2 based on SRES A2 and B2. The SRES A2 and B2 were used to estimate NDVI values for selected land use using the relation of NDVI-Temperature using linear regression of observed data (in years 1998$\sim$2002). Land use change based on SRES A2 and B2 was estimated every 5- and 10-year period using the CA-Markov technique based on the 1985, 1990, 1995 and 2000 land cover map classified by Landsat TM satellite images. As a result, the trend in land use change in each land class was reflected. When land use changes in years 2050 and 2100 were simulated using the CA-Markov method, the forest class area declined while the urban, bareground and grassland classes increased. When simulation was done further for future scenarios, the transition change converged and no increasing trend was reflected. The impact assessment of evapotranspiration was conducted by comparing the observed data with the computed results based on three cases supposition scenarios of meteorological data (temperature, global radiation and wind speed) using the FAO Penman-Monteith method. The results showed that the runoff was reduced by about 50% compared with the present hydrologic condition when each SRES and periods were compared. If there was no land use change, the runoff would decline further to about 3$\sim$5%.

Human Thermal Sensation and Comfort of Beach Areas in Summer - Woljeong-ri Beach, Gujwa-eup, Jeju-si, Jeju Special Self-Governing Province - (여름철 해변지역의 인간 열환경지수 및 열쾌적성 - 제주특별자치도 제주시 구좌읍 월정리 해변 -)

  • Park, Sookuk;Sin, Jihwan;Jo, Sangman;Hyun, Cheolji;Kang, Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.44 no.4
    • /
    • pp.100-108
    • /
    • 2016
  • The climatic index for tourism(CIT) has recently been advanced, which includes complete human energy balance models such as physiological equivalent temperature(PET) and universal thermal climate index(UTCI). This study investigated human thermal sensation and comfort at Woljung-ri Beach, Jeju, Republic of Korea, in spring and summer 2015 for landscape planning and design in beach areas. Microclimatic data measurements and human thermal sensation/comfort surveys from ISO 10551 were conducted together. There were 869 adults that participated. As a result, perceptual and thermal preference that consider only physiological aspects had high coefficients of determination($r^2$) with PET in linear regression analyses: 92.8% and 87.6%, respectively. However, affective evaluation, personal acceptability and personal tolerance, which consider both physiological and psychological aspects, had low $r^2s$: 60.0%, 21.1% and 46.4%, respectively. However, the correlations between them and PET were all significant at the 0.01 level. The neutral PET range in perceptual for human thermal sensation was $25{\sim}27^{\circ}C$, but a PET range less or equal to 20% dissatisfaction, which was recommended by ASHRAE Standard 55, could not be achieved in perceptual. Only PET ranges in affective evaluation and personal tolerance affected by both aspects were qualified for the recommendation as $21{\sim}32^{\circ}C$ and $17{\sim}37^{\circ}C$, respectively. Therefore, the PET range of $21{\sim}32^{\circ}C$ is recommended to be used for the human thermal comfort zone of beach areas in landscape planning and design as well as tourism and recreational planning. PET heat stress level ranges on the beach were $2{\sim}5^{\circ}C$ higher than those in inland urban areas of the Republic of Korea. Also, they were similar to high results of tropical areas such as Taiwan and Nigeria, and higher than those of western and middle Europe and Tel Aviv, Israel.

Trends in metabolic risk factors among patients with diabetes mellitus according to income levels: the Korea National Health and Nutrition Examination Surveys 1998~2014 (성인 당뇨병 환자의 소득수준에 따른 혈당, 당화혈색소, 혈압, 및 혈중지질 지표의 변화 추이 : 국민건강영양조사 1998~2014 분석 결과)

  • Cho, Sukyung;Park, Kyong
    • Journal of Nutrition and Health
    • /
    • v.52 no.2
    • /
    • pp.206-216
    • /
    • 2019
  • Purpose: Management of the metabolic risk factors in diabetes patients is essential for preventing or delaying diabetic complications. This study compared the levels of the metabolic risk factors in diabetes patients according to the income levels, and examined the secular trends in recent decades. Methods: The data from the Korea National Health and Nutrition Examination Survey 1998 ~ 2014 were used. The diabetes patients were divided into three groups based on their household income levels. General information was obtained through self-administered questionnaires, and the blood biomarkers and blood pressure data were obtained from a health examination. Multivariable linear regression models were used to compare the metabolic biomarker levels according to the household income levels, adjusting for potential confounding factors. Results: The fasting blood glucose, hemoglobin A1c, and blood lipid (total cholesterol, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, and triglyceride) levels were similar in the three groups. During the survey period of 16 years, the blood pressure showed a significant decreasing trend with time in all groups (p < 0.001). In contrast, the fasting blood glucose (p = 0.004), total cholesterol (p < 0.001), and LDL-cholesterol levels (p = 0.007) decreased significantly, and the HDL-cholesterol level (p < 0.001) increased significantly in the highest-income groups. In the lowest-income group, the fasting blood glucose (p = 0.02), total cholesterol (p < 0.001), and triglyceride (p = 0.003) levels showed a significant decreasing trend over time. On the other hand, the middle-income group showed no significant change in any of the metabolic risk factors except for blood pressure. Conclusion: The level of management of metabolic risk factors according to the income level of Korean diabetes patients was similar. On the other hand, the highest- and lowest-income groups showed positive trends of management of these factors during 16 years of observation, whereas the middle-income group did not show any improvement.

Estimation for Ground Air Temperature Using GEO-KOMPSAT-2A and Deep Neural Network (심층신경망과 천리안위성 2A호를 활용한 지상기온 추정에 관한 연구)

  • Taeyoon Eom;Kwangnyun Kim;Yonghan Jo;Keunyong Song;Yunjeong Lee;Yun Gon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.2
    • /
    • pp.207-221
    • /
    • 2023
  • This study suggests deep neural network models for estimating air temperature with Level 1B (L1B) datasets of GEO-KOMPSAT-2A (GK-2A). The temperature at 1.5 m above the ground impact not only daily life but also weather warnings such as cold and heat waves. There are many studies to assume the air temperature from the land surface temperature (LST) retrieved from satellites because the air temperature has a strong relationship with the LST. However, an algorithm of the LST, Level 2 output of GK-2A, works only clear sky pixels. To overcome the cloud effects, we apply a deep neural network (DNN) model to assume the air temperature with L1B calibrated for radiometric and geometrics from raw satellite data and compare the model with a linear regression model between LST and air temperature. The root mean square errors (RMSE) of the air temperature for model outputs are used to evaluate the model. The number of 95 in-situ air temperature data was 2,496,634 and the ratio of datasets paired with LST and L1B show 42.1% and 98.4%. The training years are 2020 and 2021 and 2022 is used to validate. The DNN model is designed with an input layer taking 16 channels and four hidden fully connected layers to assume an air temperature. As a result of the model using 16 bands of L1B, the DNN with RMSE 2.22℃ showed great performance than the baseline model with RMSE 3.55℃ on clear sky conditions and the total RMSE including overcast samples was 3.33℃. It is suggested that the DNN is able to overcome cloud effects. However, it showed different characteristics in seasonal and hourly analysis and needed to append solar information as inputs to make a general DNN model because the summer and winter seasons showed a low coefficient of determinations with high standard deviations.