• Title/Summary/Keyword: Estimation Methodology

Search Result 1,041, Processing Time 0.027 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

A Study on Delineation of Groundwater Recharge Rate Using Water-Table Fluctuation and Unsaturate Zone Soil Water Content Model (지하수위 변동 예측 및 비포화대 함수모델을 이용한 지하수 함양율 산정 연구)

  • Cho, Jin-Wook;Park, Eun-Gyu
    • Journal of Soil and Groundwater Environment
    • /
    • v.13 no.1
    • /
    • pp.67-76
    • /
    • 2008
  • In this study, a combined model of a water-table fluctuation and a soil moisture content model is proposed for the estimation of groundwater recharge rate at a given location. To evaluate the model, groundwater level data from 4 monitoring wells (Pohang Yeonil, Pohang Kibuk, Suncheon Oeseo, Hongcheon Hongcheon) of National Groundwater Monitoring Network from 1996 to 2005 and precipitation data of corresponding years are used. From the proposed methodology, the groundwater recharge rates are estimated to be from 0.5 to 61.4% for Hongcheon Hongcheon, from 1.1 to 27.4% for Pohang Yeonil, from 5.1 to 41.4% for Pohang Kibuk, and from 1.1 to 8.3% for Suncheon Oeseo. The magnitude of variation of the estimated recharge rate depends on the soil type observed near the stations. The groundwater fluctuation model used in this study includes precipitation as a unique source of water-table perturbation and there may exist corollary limitations. To improve the applicability of the proposed method, a capillary-water content constitutive model for unsaturated fractured rock media may be considered. The proposed recharge rate delineation method is physically based and uses minimum numbers of assumptions. The method may be used as a better substitute for the previous tools for delineating recharge rate of a location using water-table fluctuation method and contribute to national groundwater management plan. Further research on the spatial interpolation of the method is under progress.

A Study on the Prediction of Discharge by Estimating Optimum Parameter of Mean Velocity Equation (평균유속공식의 최적매개변수 산정에 의한 유량예측에 관한 연구)

  • Choo, Tai Ho;Chae, Soo Kwon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.11
    • /
    • pp.5578-5586
    • /
    • 2012
  • The accurate estimation of discharge is very essential as the important factor of river design for the utilization and flood control, hydraulic construction design. The present discharge production is using the stage-discharge relationship curve in the river. The rating curve uses the method by predicting the discharge based on regression analysis using the measured stage and discharge data in a flood season. The method is comparatively convenient and has especially advantages in that it can predict the discharge having the difficulty of observation in a flood season. However, this method has basically room for improvement because the method only uses the relationship between stage and discharge, and doesn't reflect the hydraulic parameters such as hydraulic radius, energy slope, roughness, topography, etc.. Therefore, in this study, discharge was predicted using the convenient calculation method with empirical parameters of the Manning and Chezy equations, which were proposed by Choo et at (2011) in KSCE as a new methodology for estimating discharge in open channel. The proposed method can conveniently estimate empirical parameters in both of Manning and Chezy equations and the discharge is estimated in the open channels. There are proved by using data measured in meandering lab. channel and India canal and the accuracies show about determination coefficient 0.8. Accordingly, this method will be used in actual field if this study is continuously conducted.

Estimation of Genetic Parameters for Four Reproduction Component Traits in Two Chinese Indigenous Pig Breeds

  • Zhu, M.J.;Ding, J.T.;Liu, B.;Yu, M.;Fan, B.;Li, C.C.;Zhao, S.H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.8
    • /
    • pp.1109-1115
    • /
    • 2008
  • The reproduction component traits are important components of sow efficiency. The objective of this study was to evaluate the phenotypic and genetic parameters of four reproduction component traits (age at puberty (AP), preweaning number dead (PND), weaning to service interval (WSI), and intra-individual SD in litter size (IISDLS)) of sows in two Chinese indigenous pig breeds. Available reproductive records including 22,591 piglets born from 2,054 litters by 574 Jiangquhai sows and 464 Meishan sows were used in this investigation. A set of mixed models and restricted maximum likelihood methodology were used for the multiple trait analyses of these traits. The results showed that the estimates of heritabilities (${\pm}$standard error) for AP, PND, WSI and IISDLS were $0.40{\pm}0.05$, $0.06{\pm}0.03$, $0.20{\pm}0.02$ and 0.09{\pm}0.03 in Jiangquhai sows, and $0.35{\pm}0.06$, $0.05{\pm}0.03$, $0.18{\pm}0.03$ and $0.10{\pm}0.04$ in Meishan sows, respectively. There was moderate genetic correlation between AP and WSI, while there were low genetic correlations between the other pairwise traits. The genetic correlations were positive for most of the pairwise traits, except for the one between AP and IISDLS. The results indicated that all traits except for AP were difficult to make genetic improvement by traditional selection methods due to low heritabilities and the favorable improvement of AP might result in unfavorable changes of IISDLS due to the trend of genetic antagonism.

Development of Information System based on GIS for Analyzing Basin-Wide Pollutant Washoff (유역오염원 수질거동해석을 위한 GIS기반 정보시스템 개발)

  • Park, Dae-Hee;Ha, Sung-Ryong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.9 no.4
    • /
    • pp.34-44
    • /
    • 2006
  • Simulation models allow researchers to model large hydrological catchment for comprehensive management of the water resources and explication of the diffuse pollution processes, such as land-use changes by development plan of the region. Recently, there have been reported many researches that examine water body quality using Geographic Information System (GIS) and dynamic watershed models such as AGNPS, HSPF, SWAT that necessitate handling large amounts of data. The aim of this study is to develop a watershed based water quality estimation system for the impact assessment on stream water quality. KBASIN-HSPF, proposed in this study, provides easy data compiling for HSPF by facilitating the setup and simulation process. It also assists the spatial interpretation of point and non-point pollutant information and thiessen rainfall creation and pre and post processing for large environmental data An integration methodology of GIS and water quality model for the preprocessing geo-morphologic data was designed by coupling the data model KBASIN-HSPF interface comprises four modules: registration and modification of basic environmental information, watershed delineation generator, watershed geo-morphologic index calculator and model input file processor. KBASIN-HSPF was applied to simulate the water quality impact by variation of subbasin pollution discharge structure.

  • PDF

Estimation of the Spillovers during the Global Financial Crisis (글로벌 금융위기 동안 전이효과에 대한 추정)

  • Lee, Kyung-Hee;Kim, Kyung-Soo
    • Management & Information Systems Review
    • /
    • v.39 no.2
    • /
    • pp.17-37
    • /
    • 2020
  • The purpose of this study is to investigate the global spillover effects through the existence of linear and nonlinear causal relationships between the US, European and BRIC financial markets after the period from the introduction of the Euro, the financial crisis and the subsequent EU debt crisis in 2007~2010. Although the global spillover effects of the financial crisis are well described, the nature of the volatility effects and the spread mechanisms between the US, Europe and BRIC stock markets have not been systematically examined. A stepwise filtering methodology was introduced to investigate the dynamic linear and nonlinear causality, which included a vector autoregressive regression model and a multivariate GARCH model. The sample in this paper includes the post-Euro period, and also includes the financial crisis and the Eurozone financial and sovereign crisis. The empirical results can have many implications for the efficiency of the BRIC stock market. These results not only affect the predictability of this market, but can also be useful in future research to quantify the process of financial integration in the market. The interdependence between the United States, Europe and the BRIC can reveal significant implications for financial market regulation, hedging and trading strategies. And the findings show that the BRIC has been integrated internationally since the sub-prime and financial crisis erupted in the United States, and the spillover effects have become more specific and remarkable. Furthermore, there is no consistent evidence supporting the decoupling phenomenon. Some nonlinear causality persists even after filtering during the investigation period. Although the tail distribution dependence and higher moments may be significant factors for the remaining interdependencies, this can be largely explained by the simple volatility spillover effects in nonlinear causality.

A study on Property and CO2 Emission Factor of Domestic Transportation Fuel (국내 수송용 연료의 물성 및 CO2 배출계수 산정연구)

  • Kang, Hyungkyu;Doe, Jinwoo;Ha, Jonghan;Na, Byungki
    • Journal of Energy Engineering
    • /
    • v.23 no.3
    • /
    • pp.72-81
    • /
    • 2014
  • Intergovernmental Panel on Climate Change(IPCC) suggested the three methodology, Tier 1/2/3, considering with the accuracy and difficulty of greenhouse gas emission statistics according to the report determined as the international criterion. In Korea, the existing inventory building was made by the Top-down approach applying with the emission factors for transportation in the entire energy consumption, the emission factors were investigated under the domestic traffic situation which did not reflect by the continuing increase of vehicle and the change of road section. From the suggestion of IPCC, which it is estimated that the emission estimation of $CO_2$ in greenhouse gas emission could be calculated more accurate by the carbon content according to the fuel, the establishment of measures to respond to climate change from the latest greenhouse gas emissions statistics will be able to improve the accuracy of national statistics using monthly or seasonally the analysis of carbon content about the transportation fuels.

Level Shifts and Long-term Memory in Stock Distribution Markets (주식유통시장의 층위이동과 장기기억과정)

  • Chung, Jin-Taek
    • Journal of Distribution Science
    • /
    • v.14 no.1
    • /
    • pp.93-102
    • /
    • 2016
  • Purpose - The purpose of paper is studying the static and dynamic side for long-term memory storage properties, and increase the explanatory power regarding the long-term memory process by looking at the long-term storage attributes, Korea Composite Stock Price Index. The reason for the use of GPH statistic is to derive the modified statistic Korea's stock market, and to research a process of long-term memory. Research design, data, and methodology - Level shifts were subjected to be an empirical analysis by applying the GPH method. It has been modified by taking into account the daily log return of the Korea Composite Stock Price Index a. The Data, used for the stock market to analyze whether deciding the action by the long-term memory process, yield daily stock price index of the Korea Composite Stock Price Index and the rate of return a log. The studies were proceeded with long-term memory and long-term semiparametric method in deriving the long-term memory estimators. Chapter 2 examines the leading research, and Chapter 3 describes the long-term memory processes and estimation methods. GPH statistics induced modifications of statistics and discussed Whittle statistic. Chapter 4 used Korea Composite Stock Price Index to estimate the long-term memory process parameters. Chapter 6 presents the conclusions and implications. Results - If the price of the time series is generated by the abnormal process, it may be located in long-term memory by a time series. However, test results by price fixed GPH method is not followed by long-term memory process or fractional differential process. In the case of the time-series level shift, the present test method for a long-term memory processes has a considerable amount of bias, and there exists a structural change in the stock distribution market. This structural change has implications in level shift. Stratum level shift assays are not considered as shifted strata. They exist distinctly in the stock secondary market as bias, and are presented in the test statistic of non-long-term memory process. It also generates an error as a long-term memory that could lead to false results. Conclusions - Changes in long-term memory characteristics associated with level shift present the following two suggestions. One, if any impact outside is flowed for a long period of time, we can know that the long-term memory processes have characteristic of the average return gradually. When the investor makes an investment, the same reasoning applies to him in the light of the characteristics of the long-term memory. It is suggested that when investors make decisions on investment, it is necessary to consider the characters of the long-term storage in reference with causing investors to increase the uncertainty and potential. The other one is the thing which must be considered variously according to time-series. The research for price-earnings ratio and investment risk should be composed of the long-term memory characters, and it would have more predictability.

Analytical Approach for the Noise Properties and Geometric Scheme of Industrial CR Images according to Radiation Intensity (산업용 CR영상의 방사선 강도에 따른 잡음특성과 기하학적 구도형성의 해석적 접근)

  • Hwang, Jung-Won;Hwang, Jae-Ho;Park, Sang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.56-62
    • /
    • 2009
  • In this paper we investigate an analytical approach for noise properties and geometric structure in Computed Radiography(CR) images of industrial steel-tubes. Over thirty diverse radiographic images are sampled from industrial radiography measurements according to radiation intensity. Each image consists of three regions; background, thickness and inner-tube. Among these the region of inner-tube is selected for the object of analysis. Geometric structure which includes the noise generation is analyzed by the statistical and functional methodology. The analysis is carried on spacially and line by line. It verifies the geometrical transfigure from the circle configuration of steel-tube and noise variation. The estimation of fitting function and its error are the geometric factors. The statistics such as standard deviation, mean and signal-to-noise ratio are noise parameters for discrimination. These factors are considered under the intensity variation which is the penetrative strength of radiation. The analysing results show that the original geometry of circle is preserved in the form of elliptic or short/long diameter circle, and the noise deviation has increased inverse proportional to the radiation intensity.

Estimation on Greenhouse Gases(GHGs) Emission of Large Forest Fire Area in 2013 (RapidEye 영상을 활용한 대형산불피해지의 온실가스 배출량 추정)

  • Won, Myoung-Soo;Kim, You-Seung;Kim, Kyong-Ha
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.3
    • /
    • pp.54-67
    • /
    • 2014
  • This study was performed to estimate Greenhouse gases(GHGs) emissions from biomass burning at large forest fire(Ulju, Pohang and Bonghwa) in 2013. The extended methodology to estimate GHGs adopted the IPCC(Intergovermental Panel on Climate Change) Guidelines(2006) equation. For classifying fire damaged area and analyzing burn severity of total three large-fire area damaged, this study used post-fire imagery from Rapideye imagery to compute the Maximum Likelihood Classifiction (MLC). The result of accuracy assessment on burn severity from imagery showed that average overall accuracy was 75.93% and Kapp coefficient was 0.67 Finally, GHGs emissions from biomass burning in the three large-fire area 2013 were estimated as follows: Ulju $CO_2$ 63,260, CO 5.207, $CH_4$ 360, $N_2O$ 28.0 and $NO_x$ $4.4g/kg^{-1}{\cdot}ha^{-1}$, Pohang $CO_2$ 28,675, CO 2.359, $CH_4$ 163, $N_2O$ 12.7 and $NO_x$ $1.9g/kg^{-1}{\cdot}ha^{-1}$ and Bonghwa $CO_2$ 53,086, CO 1,655, $CH_4$ 114, $N_2O$ 23.5 and $NO_x$ $3.6g/kg^{-1}{\cdot}ha^{-1}$.