• Title/Summary/Keyword: Classification Variables

Search Result 938, Processing Time 0.037 seconds

Factors Affecting International Transfer Pricing of Multinational Enterprises in Korea (외국인투자기업의 국제이전가격 결정에 영향을 미치는 환경 및 기업요인)

  • Jun, Tae-Young;Byun, Yong-Hwan
    • Korean small business review
    • /
    • v.31 no.2
    • /
    • pp.85-102
    • /
    • 2009
  • With the continued globalization of world markets, transfer pricing has become one of the dominant sources of controversy in international taxation. Transfer pricing is the process by which a multinational corporation calculates a price for goods and services that are transferred to affiliated entities. Consider a Korean electronic enterprise that buys supplies from its own subsidiary located in China. How much the Korean parent company pays its subsidiary will determine how much profit the Chinese unit reports in local taxes. If the parent company pays above normal market prices, it may appear to have a poor profit, even if the group as a whole shows a respectable profit margin. In this way, transfer prices impact the taxable income reported in each country in which the multinational enterprise operates. It's importance lies in that around 60% of international trade involves transactions between two related parts of multinationals, according to the OECD. Multinational enterprises (hereafter MEs) exert much effort into utilizing organizational advantages to make global investments. MEs wish to minimize their tax burden. So MEs spend a fortune on economists and accountants to justify transfer prices that suit their tax needs. On the contrary, local governments are not prepared to cope with MEs' powerful financial instruments. Tax authorities in each country wish to ensure that the tax base of any ME is divided fairly. Thus, both tax authorities and MEs have a vested interest in the way in which a transfer price is determined, and this is why MEs' international transfer prices are at the center of disputes concerned with taxation. Transfer pricing issues and practices are sometimes difficult to control for regulators because the tax administration does not have enough staffs with the knowledge and resources necessary to understand them. The authors examine transfer pricing practices to provide relevant resources useful in designing tax incentives and regulation schemes for policy makers. This study focuses on identifying the relevant business and environmental factors that could influence the international transfer pricing of MEs. In this perspective, we empirically investigate how the management perception of related variables influences their choice of international transfer pricing methods. We believe that this research is particularly useful in the design of tax policy. Because it can concentrate on a few selected factors in consideration of the limited budget of the tax administration with assistance of this research. Data is composed of questionnaire responses from foreign firms in Korea with investment balances exceeding one million dollars in the end of 2004. We mailed questionnaires to 861 managers in charge of the accounting departments of each company, resulting in 121 valid responses. Seventy six percent of the sample firms are classified as small and medium sized enterprises with assets below 100 billion Korean won. Reviewing transfer pricing methods, cost-based transfer pricing is most popular showing that 60 firms have adopted it. The market-based method is used by 31 firms, and 13 firms have reported the resale-pricing method. Regarding the nationalities of foreign investors, the Japanese and the Americans constitute most of the sample. Logistic regressions have been performed for statistical analysis. The dependent variable is binary in that whether the method of international transfer pricing is a market-based method or a cost-based method. This type of binary classification is founded on the belief that the market-based method is evaluated as the relatively objective way of pricing compared with the cost-based methods. Cost-based pricing is assumed to give mangers flexibility in transfer pricing decisions. Therefore, local regulatory agencies are thought to prefer market-based pricing over cost-based pricing. Independent variables are composed of eight factors such as corporate tax rate, tariffs, relations with local tax authorities, tax audit, equity ratios of local investors, volume of internal trade, sales volume, and product life cycle. The first four variables are included in the model because taxation lies in the center of transfer pricing disputes. So identifying the impact of these variables in Korean business environments is much needed. Equity ratio is included to represent the interest of local partners. Volume of internal trade was sometimes employed in previous research to check the pricing behavior of managers, so we have followed these footsteps in this paper. Product life cycle is used as a surrogate of competition in local markets. Control variables are firm size and nationality of foreign investors. Firm size is controlled using dummy variables in that whether or not the specific firm is small and medium sized. This is because some researchers report that big firms show different behaviors compared with small and medium sized firms in transfer pricing. The other control variable is also expressed in dummy variable showing if the entrepreneur is the American or not. That's because some prior studies conclude that the American management style is different in that they limit branch manger's freedom of decision. Reviewing the statistical results, we have found that managers prefer the cost-based method over the market-based method as the importance of corporate taxes and tariffs increase. This result means that managers need flexibility to lessen the tax burden when they feel taxes are important. They also prefer the cost-based method as the product life cycle matures, which means that they support subsidiaries in local market competition using cost-based transfer pricing. On the contrary, as the relationship with local tax authorities becomes more important, managers prefer the market-based method. That is because market-based pricing is a better way to maintain good relations with the tax officials. Other variables like tax audit, volume of internal transactions, sales volume, and local equity ratio have shown only insignificant influence. Additionally, we have replaced two tax variables(corporate taxes and tariffs) with the data showing top marginal tax rate and mean tariff rates of each country, and have performed another regression to find if we could get different results compared with the former one. As a consequence, we have found something different on the part of mean tariffs, that shows only an insignificant influence on the dependent variable. We guess that each company in the sample pays tariffs with a specific rate applied only for one's own company, which could be located far from mean tariff rates. Therefore we have concluded we need a more detailed data that shows the tariffs of each company if we want to check the role of this variable. Considering that the present paper has heavily relied on questionnaires, an effort to build a reliable data base is needed for enhancing the research reliability.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

PCA­based Waveform Classification of Rabbit Retinal Ganglion Cell Activity (주성분분석을 이용한 토끼 망막 신경절세포의 활동전위 파형 분류)

  • 진계환;조현숙;이태수;구용숙
    • Progress in Medical Physics
    • /
    • v.14 no.4
    • /
    • pp.211-217
    • /
    • 2003
  • The Principal component analysis (PCA) is a well-known data analysis method that is useful in linear feature extraction and data compression. The PCA is a linear transformation that applies an orthogonal rotation to the original data, so as to maximize the retained variance. PCA is a classical technique for obtaining an optimal overall mapping of linearly dependent patterns of correlation between variables (e.g. neurons). PCA provides, in the mean-squared error sense, an optimal linear mapping of the signals which are spread across a group of variables. These signals are concentrated into the first few components, while the noise, i.e. variance which is uncorrelated across variables, is sequestered in the remaining components. PCA has been used extensively to resolve temporal patterns in neurophysiological recordings. Because the retinal signal is stochastic process, PCA can be used to identify the retinal spikes. With excised rabbit eye, retina was isolated. A piece of retina was attached with the ganglion cell side to the surface of the microelectrode array (MEA). The MEA consisted of glass plate with 60 substrate integrated and insulated golden connection lanes terminating in an 8${\times}$8 array (spacing 200 $\mu$m, electrode diameter 30 $\mu$m) in the center of the plate. The MEA 60 system was used for the recording of retinal ganglion cell activity. The action potentials of each channel were sorted by off­line analysis tool. Spikes were detected with a threshold criterion and sorted according to their principal component composition. The first (PC1) and second principal component values (PC2) were calculated using all the waveforms of the each channel and all n time points in the waveform, where several clusters could be separated clearly in two dimension. We verified that PCA-based waveform detection was effective as an initial approach for spike sorting method.

  • PDF

THEORETICAL STUDY ON OBSERVED COLOR-MAGNITUDE DIAGRAMS

  • Lee, See-Woo
    • Journal of The Korean Astronomical Society
    • /
    • v.12 no.1
    • /
    • pp.41-70
    • /
    • 1979
  • From $B\ddot{o}hm$-Vitense's atmospheric model calculations, the relations, [$T_e$, (B-V)] and [B.C, (B-V)] with respect to heavy element abundance were obtained. Using these relations and evolutionary model calculations of Rood, and Sweigart and Gross, analytic expressions for some physical parameters relating to the C-M diagrams of globular clusters were derived, and they were applied to 21 globular clusters with observed transition periods of RR Lyrae variables. More than 20 different parameters were examined for each globular cluster. The derived ranges of some basic parameters are as follows; $Y=0.21{\sim}0.33,\;Z=1.5{\times}10^{-4}{\sim}4.5{\times}10^{-3},\;age,\;t=9.5{\sim}19{\times}10^9$ years, mass for red giants, $m_{RG}=0.74m_{\odot}{\sim}0.91m_{\odot}$, mass for RR Lyrae stars, $m_{RR}=0.59m_{\odot}{\sim}0.75m_{\odot}$, the visual magnitude difference between the turnoff point and the horizontal branch (HB), ${\Delta}V_{to}=3.1{\sim}3.4(<{\Delta}V_{to}>=3.32)$, the color of the blue edge of RR Lyrae gap, $(B-V)_{BE}=0.17{\sim}0.21=(<(B-V)_{BE}>=0.18),\;[\frac{m}{L}]_{RR}=-1.7{\sim}-1.9$, mass difference of $m_{RR}$ relative to $m_{RG},(m_{RG}-m_{RR})/m_{RG}=0.0{\sim}0.39$. It was found that the ranges of derived parameters agree reasonably well with the observed ones and those estimated by others. Some important results obtained herein can be summarized as follows; (i) There are considerable variations in the initial helium abundance and in age of globular clusters. (ii) The radial gradient of heavy element abundance does exist for globular clusters as shown by Janes for field stars and open clusters. (iii) The helium abundance seems to have been increased with age by massive star evolution after a considerable amount (Y>0.2) of helium had been attained by the Big-Bang nucleosynthesis, but there is not seen a radial gradient of helium abundance. (iv) A considerable amount of heavy elements ($Z{\sim}10{-3}$) might have been formed in the inner halo ($r_{GC}$<10 kpc) from the earliest galactic co1lapse, and then the heavy element abundance has been slowly enriched towards the galactic center and disk, establishing the radial gradient of heavy element abundance. (v) The final galactic disk formation might have taken much longer by about a half of the galactic age than the halo formation, supporting a slow, inhomogeneous co1lapse model of Larson. (vi) Of the three principal parameters controlling the morphology of C-M diagrams, it was found that the first parameter is heavy clement abundance, the second age and the third helium abundance. (vii) The globular clusters can be divided into three different groups, AI, BI and CII according to Z, Y an d age as well as Dickens' HB types. BI group clusters of HB types 4 and 5 like M 3 and NGC 7006 are the oldest and have the lowest helium abundance of the three groups. And also they appear in the inner halo. On the other hand, the youngest AI clusters have the highest Z and Y, and appear in the innermost halo region and in the disk. (viii) From the result of the clean separations of the clusters into three groups, a three dimensional classification with three parameters, Z, Y and age is prsented. (ix) The anomalous C-M diagrams can be expalined in terms of the three principal parameters. That is, the anomaly of NGC 362 and NGC 7006 is accounted for by the smaller age of the order of $1{\sim}2{\times}10^9$ years rather than by the helium abundance difference, compared with M 3. (x) The difference in two Oosterhoff types I and II can be explained in terms of the mean mass difference of RR Lyrae variables rather than in terms of the helium abundance difference as suggested by Stobie. The mean mass of the variables in Oosterhoff type I clusters is smaller by $0.074m_{\odot}$ which is exactly consistent with Rood's estimate. Since it was found that the mean mass of RR Lyrae stars increases with decreasing Z, the two Oosterhoff types can be explained substantially by the metal abundance difference; the type II has Z<$3.4{\times}10^{-4}$, and the type I has higher Z than the type II.

  • PDF

Estimation of Annual Trends and Environmental Effects on the Racing Records of Jeju Horses (제주마 주파기록에 대한 연도별 추세 및 환경효과 분석)

  • Lee, Jongan;Lee, Soo Hyun;Lee, Jae-Gu;Kim, Nam-Young;Choi, Jae-Young;Shin, Sang-Min;Choi, Jung-Woo;Cho, In-Cheol;Yang, Byoung-Chul
    • Journal of Life Science
    • /
    • v.31 no.9
    • /
    • pp.840-848
    • /
    • 2021
  • This study was conducted to estimate annual trends and the environmental effects in the racing records of Jeju horses. The Korean Racing Authority (KRA) collected 48,645 observations for 2,167 Jeju horses from 2002 to 2019. Racing records were preprocessed to eliminate errors that occur during the data collection. Racing times were adjusted for comparison between race distances. A stepwise Akaike information criterion (AIC) variable selection method was applied to select appropriate environment variables affecting racing records. The annual improvement of the race time was -0.242 seconds. The model with the lowest AIC value was established when variables were selected in the following order: year, budam classification, jockey ranking, trainer ranking, track condition, weather, age, and gender. The most suitable model was constructed when the jockey ranking and age variables were considered as random effects. Our findings have potential for application as basic data when building models for evaluating genetic abilities of Jeju horses.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Comparison of Inpatient Medical Use between Non-specialty and Specialty Hospitals: A Study Focused on Knee Replacement Arthroplasty (전문병원과 비전문병원 입원환자의 의료이용 비교 분석: 인공관절치환술(슬관절)을 대상으로)

  • Mi-Sung Kim;Hyoung-Sun Jeong;Ki-Bong Yoo;Je-Gu Kang;Han-Sol Jang;Kwang-Soo Lee
    • Health Policy and Management
    • /
    • v.34 no.1
    • /
    • pp.78-86
    • /
    • 2024
  • Background: The purpose of this study was to determine the effectiveness of the specialty hospital system by comparing the medical use of inpatients who had artificial joint replacement surgery in specialty hospitals and non-specialty hospitals. Methods: This study utilized 2021-2022 healthcare benefit claims data provided by the Health Insurance Review and Assessment Service. The dependent variable is inpatient medical use which is measured in terms of charges per case and length of stay. The independent variable was whether the hospital was designated as a specialty hospital, and the control variables were patient-level variables (age, gender, insurer type, surgery type, and Charlson comorbidity index) and medical institution-level variables (establishment type, classification, location, number of orthopedic surgeons, and number of nurses). Results: The results of the multiple regression analysis between charges per case and whether a hospital is designated as a specialty hospital showed a statistically significant negative relationship between charges per case and whether a hospital is designated as a specialty hospital. This suggests a significant low in charges per case when a hospital is designated as a specialty hospital compared to a non-specialty hospital, indicating that there is a difference in medical use outcomes between specialty hospitals and non-specialty hospitals inpatients. Conclusion: The practical implications of this study are as follows. First, the criteria for designating specialty hospitals should be alleviated. In our study, the results show that specialty hospitals have significantly lower per-case costs than non-specialty hospitals. Despite the cost-effectiveness of specialty hospitals, the high barriers to be designated for specialty hospitals have gathered the specialty hospitals in metropolitan and major cities. To address the regional imbalance of specialty hospitals, it is believed that ease the criteria for designating specialty hospitals in non-metropolitan areas, such as introducing "semi-specialty hospitals (tentative name)," will lead to a reduction in health disparities between regions and reduce medical costs. Second, it is necessary to determine the appropriateness of the size of hospitals' medical staff. The study found that the number of orthopedic surgeons and nurses varied in charges per case. Therefore, it is believed that appropriately allocating hospital medical staff can maximize the cost-effectiveness of medical services and ultimately reduce medical costs.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

A Study on Web-based Technology Valuation System (웹기반 지능형 기술가치평가 시스템에 관한 연구)

  • Sung, Tae-Eung;Jun, Seung-Pyo;Kim, Sang-Gook;Park, Hyun-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.23-46
    • /
    • 2017
  • Although there have been cases of evaluating the value of specific companies or projects which have centralized on developed countries in North America and Europe from the early 2000s, the system and methodology for estimating the economic value of individual technologies or patents has been activated on and on. Of course, there exist several online systems that qualitatively evaluate the technology's grade or the patent rating of the technology to be evaluated, as in 'KTRS' of the KIBO and 'SMART 3.1' of the Korea Invention Promotion Association. However, a web-based technology valuation system, referred to as 'STAR-Value system' that calculates the quantitative values of the subject technology for various purposes such as business feasibility analysis, investment attraction, tax/litigation, etc., has been officially opened and recently spreading. In this study, we introduce the type of methodology and evaluation model, reference information supporting these theories, and how database associated are utilized, focusing various modules and frameworks embedded in STAR-Value system. In particular, there are six valuation methods, including the discounted cash flow method (DCF), which is a representative one based on the income approach that anticipates future economic income to be valued at present, and the relief-from-royalty method, which calculates the present value of royalties' where we consider the contribution of the subject technology towards the business value created as the royalty rate. We look at how models and related support information (technology life, corporate (business) financial information, discount rate, industrial technology factors, etc.) can be used and linked in a intelligent manner. Based on the classification of information such as International Patent Classification (IPC) or Korea Standard Industry Classification (KSIC) for technology to be evaluated, the STAR-Value system automatically returns meta data such as technology cycle time (TCT), sales growth rate and profitability data of similar company or industry sector, weighted average cost of capital (WACC), indices of industrial technology factors, etc., and apply adjustment factors to them, so that the result of technology value calculation has high reliability and objectivity. Furthermore, if the information on the potential market size of the target technology and the market share of the commercialization subject refers to data-driven information, or if the estimated value range of similar technologies by industry sector is provided from the evaluation cases which are already completed and accumulated in database, the STAR-Value is anticipated that it will enable to present highly accurate value range in real time by intelligently linking various support modules. Including the explanation of the various valuation models and relevant primary variables as presented in this paper, the STAR-Value system intends to utilize more systematically and in a data-driven way by supporting the optimal model selection guideline module, intelligent technology value range reasoning module, and similar company selection based market share prediction module, etc. In addition, the research on the development and intelligence of the web-based STAR-Value system is significant in that it widely spread the web-based system that can be used in the validation and application to practices of the theoretical feasibility of the technology valuation field, and it is expected that it could be utilized in various fields of technology commercialization.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.