• Title/Summary/Keyword: Models, statistical

Search Result 3,012, Processing Time 0.036 seconds

Development of Market Growth Pattern Map Based on Growth Model and Self-organizing Map Algorithm: Focusing on ICT products (자기조직화 지도를 활용한 성장모형 기반의 시장 성장패턴 지도 구축: ICT제품을 중심으로)

  • Park, Do-Hyung;Chung, Jaekwon;Chung, Yeo Jin;Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.1-23
    • /
    • 2014
  • Market forecasting aims to estimate the sales volume of a product or service that is sold to consumers for a specific selling period. From the perspective of the enterprise, accurate market forecasting assists in determining the timing of new product introduction, product design, and establishing production plans and marketing strategies that enable a more efficient decision-making process. Moreover, accurate market forecasting enables governments to efficiently establish a national budget organization. This study aims to generate a market growth curve for ICT (information and communication technology) goods using past time series data; categorize products showing similar growth patterns; understand markets in the industry; and forecast the future outlook of such products. This study suggests the useful and meaningful process (or methodology) to identify the market growth pattern with quantitative growth model and data mining algorithm. The study employs the following methodology. At the first stage, past time series data are collected based on the target products or services of categorized industry. The data, such as the volume of sales and domestic consumption for a specific product or service, are collected from the relevant government ministry, the National Statistical Office, and other relevant government organizations. For collected data that may not be analyzed due to the lack of past data and the alteration of code names, data pre-processing work should be performed. At the second stage of this process, an optimal model for market forecasting should be selected. This model can be varied on the basis of the characteristics of each categorized industry. As this study is focused on the ICT industry, which has more frequent new technology appearances resulting in changes of the market structure, Logistic model, Gompertz model, and Bass model are selected. A hybrid model that combines different models can also be considered. The hybrid model considered for use in this study analyzes the size of the market potential through the Logistic and Gompertz models, and then the figures are used for the Bass model. The third stage of this process is to evaluate which model most accurately explains the data. In order to do this, the parameter should be estimated on the basis of the collected past time series data to generate the models' predictive value and calculate the root-mean squared error (RMSE). The model that shows the lowest average RMSE value for every product type is considered as the best model. At the fourth stage of this process, based on the estimated parameter value generated by the best model, a market growth pattern map is constructed with self-organizing map algorithm. A self-organizing map is learning with market pattern parameters for all products or services as input data, and the products or services are organized into an $N{\times}N$ map. The number of clusters increase from 2 to M, depending on the characteristics of the nodes on the map. The clusters are divided into zones, and the clusters with the ability to provide the most meaningful explanation are selected. Based on the final selection of clusters, the boundaries between the nodes are selected and, ultimately, the market growth pattern map is completed. The last step is to determine the final characteristics of the clusters as well as the market growth curve. The average of the market growth pattern parameters in the clusters is taken to be a representative figure. Using this figure, a growth curve is drawn for each cluster, and their characteristics are analyzed. Also, taking into consideration the product types in each cluster, their characteristics can be qualitatively generated. We expect that the process and system that this paper suggests can be used as a tool for forecasting demand in the ICT and other industries.

Understanding the protox inhibition activity of novel 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2-chloro-4-fluorobenzene derivatives using comparative molecular field analysis (CoMFA) methodology (비교 분자장 분석 (CoMFA) 방법에 따른 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2-chloro-4-fluoro-benzene 유도체들의 Protox 저해 활성에 관한 이해)

  • Sung, Nack-Do;Song, Jong-Hwan;Yang, Sook-Young;Park, Kyeng-Yong
    • The Korean Journal of Pesticide Science
    • /
    • v.8 no.3
    • /
    • pp.151-161
    • /
    • 2004
  • Three dimensional quantitative structure-activity relationships (3D-QSAR) studies for the protox inhibition activities against root and shoot of rice plant (Orysa sativa L.) and barnyardgrass (Echinochloa crus-galli) by a series of new A=3,4,5,6-tetrahydrophthalimino, B=3-chloro-4,5,6,7-tetrahydro-2H-indazolyl and C=3,4-dimethylmaleimino group, and R-group substituted on the phenyl ring in 1-(5-methyl-3-phenylisoxazolin-5-yl)methoxy-2chloro-4-fluorobenzene derivatives were performed using comparative molecular field analyses (CoMFA) methodology with Gasteiger-Huckel charge. Four CoMFA models for the protox inhibition activities against root and shoot of the two plants were generated using 46 molecules as training set and the predictive ability of the each models was evaluated against a test set of 8 molecules. And the statistical results of these models with combination (SIH) of standard field, indicator field and H-bond field showed the best predictability of the protox inhibition activities based on the cross-validated value $r^2_{cv.}$ $(q^2=0.635\sim0.924)$, conventional coefficient $(r^2_{ncv.}=0.928\sim0.977)$ and PRESS value $(0.091\sim0.156)$, respectively. The activities exhibited a strong correlation with steric $(74.3\sim87.4%)$, electrostatic $(10.10\sim18.5%)$ and hydrophobic $(1.10\sim8.30%)$ factors of the molecules. The steric feature of molecule may be an important factor for the activities. We founded that an novel selective and higher protox inhibitors between the two plants may be designed by modification of X-subsitutents for barnyardgrass based upon the results obtained from CoMFA analyses.

A Reliability Analysis of Shallow Foundations using a Single-Mode Performance Function (단일형 거동함수에 의한 얕은 기초의 신뢰도 해석 -임해퇴적층의 토성자료를 중심으로-)

  • 김용필;임병조
    • Geotechnical Engineering
    • /
    • v.2 no.1
    • /
    • pp.27-44
    • /
    • 1986
  • The measured soil data are analyzed to the descriptive statistics and classified into the four models of uncorrelated-normal (UNNO), uncorrelated-nonnormal (VNNN), correlatedonnormal(CONN), and correlated-nonnormal(CONN) . This paper presents the comparisons of reliability index and check points using the advanced first-order second-moment method with respect to the four models as well as BASIC Program. A sin91e-mode Performance function is consisted of the basic design variables of bearing capacity and settlements on shallow foundations and input the above analyzed soil informations. The main conclusions obtained in this study are summarized as follows: 1. In the bearing capacity mode, cohesion and bearing-capacity factors by C-U test are accepted for normal and lognormal distribution, respectively, and negatively low correlated to each other. Since the reliability index of the CONN model is the lowest one of the four model, which could be recommended a reliability.based design, whereas the other model might overestimate the geotechnical conditions. 2. In the case of settlements mode, the virgin compression ratio and preccnsolidation pressure are fitted for normal and lognormal distribution, respectively. Constraining settlements to the lower ones computed by deterministic method, The CONN model is the lowest reliability of the four models.

  • PDF

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Developing a Traffic Accident Prediction Model for Freeways (고속도로 본선에서의 교통사고 예측모형 개발)

  • Mun, Sung-Ra;Lee, Young-Ihn;Lee, Soo-Beom
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.2
    • /
    • pp.101-116
    • /
    • 2012
  • Accident prediction models have been utilized to predict accident possibilities in existing or projected freeways and to evaluate programs or policies for improving safety. In this study, a traffic accident prediction model for freeways was developed for the above purposes. When selecting variables for the model, the highest priority was on the ease of both collecting data and applying them into the model. The dependent variable was set as the number of total accidents and the number of accidents including casualties in the unit of IC(or JCT). As a result, two models were developed; the overall accident model and the casualty-related accident model. The error structure adjusted to each model was the negative binomial distribution and the Poisson distribution, respectively. Among the two models, a more appropriate model was selected by statistical estimation. Major nine national freeways were selected and five-year dada of 2003~2007 were utilized. Explanatory variables should take on either a predictable value such as traffic volumes or a fixed value with respect to geometric conditions. As a result of the Maximum Likelihood estimation, significant variables of the overall accident model were found to be the link length between ICs(or JCTs), the daily volumes(AADT), and the ratio of bus volume to the number of curved segments between ICs(or JCTs). For the casualty-related accident model, the link length between ICs(or JCTs), the daily volumes(AADT), and the ratio of bus volumes had a significant impact on the accident. The likelihood ratio test was conducted to verify the spatial and temporal transferability for estimated parameters of each model. It was found that the overall accident model could be transferred only to the road with four or more than six lanes. On the other hand, the casualty-related accident model was transferrable to every road and every time period. In conclusion, the model developed in this study was able to be extended to various applications to establish future plans and evaluate policies.

Evaluation of Oil Spill Detection Models by Oil Spill Distribution Characteristics and CNN Architectures Using Sentinel-1 SAR data (Sentienl-1 SAR 영상을 활용한 유류 분포특성과 CNN 구조에 따른 유류오염 탐지모델 성능 평가)

  • Park, Soyeon;Ahn, Myoung-Hwan;Li, Chenglei;Kim, Junwoo;Jeon, Hyungyun;Kim, Duk-jin
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_3
    • /
    • pp.1475-1490
    • /
    • 2021
  • Detecting oil spill area using statistical characteristics of SAR images has limitations in that classification algorithm is complicated and is greatly affected by outliers. To overcome these limitations, studies using neural networks to classify oil spills are recently investigated. However, the studies to evaluate whether the performance of model shows a consistent detection performance for various oil spill cases were insufficient. Therefore, in this study, two CNNs (Convolutional Neural Networks) with basic structures(Simple CNN and U-net) were used to discover whether there is a difference in detection performance according to the structure of CNN and distribution characteristics of oil spill. As a result, through the method proposed in this study, the Simple CNN with contracting path only detected oil spill with an F1 score of 86.24% and U-net, which has both contracting and expansive path showed an F1 score of 91.44%. Both models successfully detected oil spills, but detection performance of the U-net was higher than Simple CNN. Additionally, in order to compare the accuracy of models according to various oil spill cases, the cases were classified into four different categories according to the spatial distribution characteristics of the oil spill (presence of land near the oil spill area) and the clarity of border between oil and seawater. The Simple CNN had F1 score values of 85.71%, 87.43%, 86.50%, and 85.86% for each category, showing the maximum difference of 1.71%. In the case of U-net, the values for each category were 89.77%, 92.27%, 92.59%, and 92.66%, with the maximum difference of 2.90%. Such results indicate that neither model showed significant differences in detection performance by the characteristics of oil spill distribution. However, the difference in detection tendency was caused by the difference in the model structure and the oil spill distribution characteristics. In all four oil spill categories, the Simple CNN showed a tendency to overestimate the oil spill area and the U-net showed a tendency to underestimate it. These tendencies were emphasized when the border between oil and seawater was unclear.

Prediction of Maximal Oxygen Uptake Ages 18~34 Years (18~34 남성의 최대산소 섭취량 추정)

  • Jeon, Yoo-Joung;Im, Jae-Hyeng;Lee, Byung-Kun;Kim, Chang-Hwan;Kim, Byeong-Wan
    • 한국체육학회지인문사회과학편
    • /
    • v.51 no.3
    • /
    • pp.373-382
    • /
    • 2012
  • The purpose of this study is to predict VO2max with body index and submaximal metabolic responses. The subjects are consisted of 250 male aging from 18 to 34 and we separated them into two groups randomly; 179 for a sample, 71 for a cross-validation group. They went through maximal exercise testing with Bruce protocol, and we measured the metabolic responses in the end of the first(3 minute) and second stage(6 minute). To predict VO2max, we applied multiple regression analysis to the sample with stepwise method. Model 1's variables are weight, 6 minute HR and 6 minute VO2(R=0.64, SEE=4.74, CV=11.7%, p<.01), and the equation is VO2max(ml/kg/min)= 72.256-0.340(Weight)-0.220(6minHR)+0.013(6minVO2). Model 2's variables are weight, 6 minute HR, 6 minute VO2, and 6 minute VCO2(R=0.67, SEE=4.59, CV=11.3%, p<.01), and the equation is VO2max(ml/kg/min)= 68.699-0.277(Weight) -0.206(6minHR)+0.020(6minVO2)-0.009(6minVCO2). And the result did not show multicolinearity for both models. Model 2 demonstrated more correlation compared to Model 1. However, when we conducted cross-validation of those models with 71 men, measured VO2max and estimated VO2 Max had statistical significance with correlation (R=0.53, 0.56, P<.01). Although both models are functional with validity considering their simplicity and utility, Model 2 has more accuracy.

Investigating Data Preprocessing Algorithms of a Deep Learning Postprocessing Model for the Improvement of Sub-Seasonal to Seasonal Climate Predictions (계절내-계절 기후예측의 딥러닝 기반 후보정을 위한 입력자료 전처리 기법 평가)

  • Uran Chung;Jinyoung Rhee;Miae Kim;Soo-Jin Sohn
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.2
    • /
    • pp.80-98
    • /
    • 2023
  • This study explores the effectiveness of various data preprocessing algorithms for improving subseasonal to seasonal (S2S) climate predictions from six climate forecast models and their Multi-Model Ensemble (MME) using a deep learning-based postprocessing model. A pipeline of data transformation algorithms was constructed to convert raw S2S prediction data into the training data processed with several statistical distribution. A dimensionality reduction algorithm for selecting features through rankings of correlation coefficients between the observed and the input data. The training model in the study was designed with TimeDistributed wrapper applied to all convolutional layers of U-Net: The TimeDistributed wrapper allows a U-Net convolutional layer to be directly applied to 5-dimensional time series data while maintaining the time axis of data, but every input should be at least 3D in U-Net. We found that Robust and Standard transformation algorithms are most suitable for improving S2S predictions. The dimensionality reduction based on feature selections did not significantly improve predictions of daily precipitation for six climate models and even worsened predictions of daily maximum and minimum temperatures. While deep learning-based postprocessing was also improved MME S2S precipitation predictions, it did not have a significant effect on temperature predictions, particularly for the lead time of weeks 1 and 2. Further research is needed to develop an optimal deep learning model for improving S2S temperature predictions by testing various models and parameters.

A Study on Medical Waste Generation Analysis during Outbreak of Massive Infectious Diseases (대규모 감염병 발병에 따른 의료폐기물 발생량 예측에 관한 연구)

  • Sang-Min Kim;Jin-Kyu Park;In-Beom Ko;Byung-Sun Lee;Sang-Ryong Shin;Nam-Hoon Lee
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.31 no.4
    • /
    • pp.29-39
    • /
    • 2023
  • In this study, an analysis of medical waste generation characteristics was conducted, differentiating between ordinary situation and the outbreaks of massive infectious diseases. During ordinary situation, prediction models for medical waste quantities by type, general medical waste(G-MW), hazardous medical waste(H-MW), infectious medical waste(I-MW), were established through regression analysis, with all significance values (p) being <0.0001, indicating statistical significance. The determination coefficient(R2) values for prediction models of each category were analyzed as follows : I-MW(R2=0.9943) > G-MW(R2=0.9817) > H-MW(R2=0.9310). Additionally, factors such as GDP(G-MW), the number of medical institutions (H-MW), and the elderly population ratio(I-MW), utilized as influencing factors and consistent with previous literature, showed high correlations. The total MW generation, evaluated by combining each model, had an MAE of 2,615 and RMSE of 3,353. This indicated accuracy levels similar to the medical waste models of H-MW(2,491, 2,890) and I-MW(2,291, 3,267). Due to limitations in accurately estimating the quantity of medical waste during the rapid and outbreaks of massive infectious diseases, the generation unit of I-MW was derived to analyze its characteristics. During the early unstable stage of infectious disease outbreaks, the generation unit was 8.74 kg/capita·day, 2.69 kg/capita·day during the stable stage, and an average of 0.08 kg/capita·day during the reduction stage. Correlation analysis between generation unit of I-MW and lethality rates showed +0.99 in the unstable stage, +0.52 in the stable stage, and +0.96 in the reduction period, demonstrating a very high positive correlation of +0.95 or higher throughout the entire outbreaks of massive infectious diseases. The results derived from this study are expected to play a useful role in establishing an effective medical waste management system in the field of health care.

A Statistical model to Predict soil Temperature by Combining the Yearly Oscillation Fourier Expansion and Meteorological Factors (연주기(年週期) Fourier 함수(函數)와 기상요소(氣象要素)에 의(依)한 지온예측(地溫豫測) 통계(統計) 모형(模型))

  • Jung, Yeong-Sang;Lee, Byun-Woo;Kim, Byung-Chang;Lee, Yang-Soo;Um, Ki-Tae
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.23 no.2
    • /
    • pp.87-93
    • /
    • 1990
  • A statistical model to predict soil temperature from the ambient meteorological factors including mean, maximum and minimum air temperatures, precipitation, wind speed and snow depth combined with Fourier time series expansion was developed with the data measured at the Suwon Meteorolical Service from 1979 to 1988. The stepwise elimination technique was used for statistical analysis. For the yearly oscillation model for soil temperature with 8 terms of Fourier expansion, the mean square error was decreased with soil depth showing 2.30 for the surface temperature, and 1.34-0.42 for 5 to 500-cm soil temperatures. The $r^2$ ranged from 0.913 to 0.988. The number of lag days of air temperature by remainder analysis was 0 day for the soil surface temperature, -1 day for 5 to 30-cm soil temperature, and -2 days for 50-cm soil temperature. The number of lag days for precipitaion, snow depth and wind speed was -1 day for the 0 to 10-cm soil temperatures, and -2 to -3 days for the 30 to 50-cm soil teperatures. For the statistical soil temperature prediction model combined with the yearly oscillation terms and meteorological factors as remainder terms considering the lag days obtained above, the mean square error was 1.64 for the soil surfac temperature, and ranged 1.34-0.42 for 5 to 500cm soil temperatures. The model test with 1978 data independent to model development resulted in good agreement with $r^2$ ranged 0.976 to 0.996. The magnitudes of coeffcicients implied that the soil depth where daily meteorological variables night affect soil temperature was 30 to 50 cm. In the models, solar radiation was not included as a independent variable ; however, in a seperated analysis on relationship between the difference(${\Delta}Tmxs$) of the maximum soil temperature and the maximum air temperature and solar radiation(Rs ; $J\;m^{-2}$) under a corn canopy showed linear relationship as $${\Delta}Tmxs=0.902+1.924{\times}10^{-3}$$ Rs for leaf area index lower than 2 $${\Delta}Tmxs=0.274+8.881{\times}10^{-4}$$ Rs for leaf area index higher than 2.

  • PDF