• Title/Summary/Keyword: multiple-decision method

Search Result 459, Processing Time 0.032 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

A New Exploratory Research on Franchisor's Provision of Exclusive Territories (가맹본부의 배타적 영업지역보호에 대한 탐색적 연구)

  • Lim, Young-Kyun;Lee, Su-Dong;Kim, Ju-Young
    • Journal of Distribution Research
    • /
    • v.17 no.1
    • /
    • pp.37-63
    • /
    • 2012
  • In franchise business, exclusive sales territory (sometimes EST in table) protection is a very important issue from an economic, social and political point of view. It affects the growth and survival of both franchisor and franchisee and often raises issues of social and political conflicts. When franchisee is not familiar with related laws and regulations, franchisor has high chance to utilize it. Exclusive sales territory protection by the manufacturer and distributors (wholesalers or retailers) means sales area restriction by which only certain distributors have right to sell products or services. The distributor, who has been granted exclusive sales territories, can protect its own territory, whereas he may be prohibited from entering in other regions. Even though exclusive sales territory is a quite critical problem in franchise business, there is not much rigorous research about the reason, results, evaluation, and future direction based on empirical data. This paper tries to address this problem not only from logical and nomological validity, but from empirical validation. While we purse an empirical analysis, we take into account the difficulties of real data collection and statistical analysis techniques. We use a set of disclosure document data collected by Korea Fair Trade Commission, instead of conventional survey method which is usually criticized for its measurement error. Existing theories about exclusive sales territory can be summarized into two groups as shown in the table below. The first one is about the effectiveness of exclusive sales territory from both franchisor and franchisee point of view. In fact, output of exclusive sales territory can be positive for franchisors but negative for franchisees. Also, it can be positive in terms of sales but negative in terms of profit. Therefore, variables and viewpoints should be set properly. The other one is about the motive or reason why exclusive sales territory is protected. The reasons can be classified into four groups - industry characteristics, franchise systems characteristics, capability to maintain exclusive sales territory, and strategic decision. Within four groups of reasons, there are more specific variables and theories as below. Based on these theories, we develop nine hypotheses which are briefly shown in the last table below with the results. In order to validate the hypothesis, data is collected from government (FTC) homepage which is open source. The sample consists of 1,896 franchisors and it contains about three year operation data, from 2006 to 2008. Within the samples, 627 have exclusive sales territory protection policy and the one with exclusive sales territory policy is not evenly distributed over 19 representative industries. Additional data are also collected from another government agency homepage, like Statistics Korea. Also, we combine data from various secondary sources to create meaningful variables as shown in the table below. All variables are dichotomized by mean or median split if they are not inherently dichotomized by its definition, since each hypothesis is composed by multiple variables and there is no solid statistical technique to incorporate all these conditions to test the hypotheses. This paper uses a simple chi-square test because hypotheses and theories are built upon quite specific conditions such as industry type, economic condition, company history and various strategic purposes. It is almost impossible to find all those samples to satisfy them and it can't be manipulated in experimental settings. However, more advanced statistical techniques are very good on clean data without exogenous variables, but not good with real complex data. The chi-square test is applied in a way that samples are grouped into four with two criteria, whether they use exclusive sales territory protection or not, and whether they satisfy conditions of each hypothesis. So the proportion of sample franchisors which satisfy conditions and protect exclusive sales territory, does significantly exceed the proportion of samples that satisfy condition and do not protect. In fact, chi-square test is equivalent with the Poisson regression which allows more flexible application. As results, only three hypotheses are accepted. When attitude toward the risk is high so loyalty fee is determined according to sales performance, EST protection makes poor results as expected. And when franchisor protects EST in order to recruit franchisee easily, EST protection makes better results. Also, when EST protection is to improve the efficiency of franchise system as a whole, it shows better performances. High efficiency is achieved as EST prohibits the free riding of franchisee who exploits other's marketing efforts, and it encourages proper investments and distributes franchisee into multiple regions evenly. Other hypotheses are not supported in the results of significance testing. Exclusive sales territory should be protected from proper motives and administered for mutual benefits. Legal restrictions driven by the government agency like FTC could be misused and cause mis-understandings. So there need more careful monitoring on real practices and more rigorous studies by both academicians and practitioners.

  • PDF

Genomic selection through single-step genomic best linear unbiased prediction improves the accuracy of evaluation in Hanwoo cattle

  • Park, Mi Na;Alam, Mahboob;Kim, Sidong;Park, Byoungho;Lee, Seung Hwan;Lee, Sung Soo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.10
    • /
    • pp.1544-1557
    • /
    • 2020
  • Objective: Genomic selection (GS) is becoming popular in animals' genetic development. We, therefore, investigated the single-step genomic best linear unbiased prediction (ssGBLUP) as tool for GS, and compared its efficacy with the traditional pedigree BLUP (pedBLUP) method. Methods: A total of 9,952 males born between 1997 and 2018 under Hanwoo proven-bull selection program was studied. We analyzed body weight at 12 months and carcass weight (kg), backfat thickness, eye muscle area, and marbling score traits. About 7,387 bulls were genotyped using Illumina 50K BeadChip Arrays. Multiple-trait animal model analyses were performed using BLUPF90 software programs. Breeding value accuracy was calculated using two methods: i) Pearson's correlation of genomic estimated breeding value (GEBV) with EBV of all animals (rM1) and ii) correlation using inverse of coefficient matrix from the mixed-model equations (rM2). Then, we compared these accuracies by overall population, info-type (PHEN, phenotyped-only; GEN, genotyped-only; and PH+GEN, phenotyped and genotyped), and bull-types (YBULL, young male calves; CBULL, young candidate bulls; and PBULL, proven bulls). Results: The rM1 estimates in the study were between 0.90 and 0.96 among five traits. The rM1 estimates varied slightly by population and info-type, but noticeably by bull-type for traits. Generally average rM2 estimates were much smaller than rM1 (pedBLUP, 0.40 to0.44; ssGBLUP, 0.41 to 0.45) at population level. However, rM2 from both BLUP models varied noticeably across info-types and bull-types. The ssGBLUP estimates of rM2 in PHEN, GEN, and PH+ GEN ranged between 0.51 and 0.63, 0.66 and 0.70, and 0.68 and 0.73, respectively. In YBULL, CBULL, and PBULL, the rM2 estimates ranged between 0.54 and 0.57, 0.55 and 0.62, and 0.70 and 0.74, respectively. The pedBLUP based rM2 estimates were also relatively lower than ssGBLUP estimates. At the population level, we found an increase in accuracy by 2.0% to 4.5% among traits. Traits in PHEN were least influenced by ssGBLUP (0% to 2.0%), whereas the highest positive changes were in GEN (8.1% to 10.7%). PH+GEN also showed 6.5% to 8.5% increase in accuracy by ssGBLUP. However, the highest improvements were found in bull-types (YBULL, 21% to 35.7%; CBULL, 3.3% to 9.3%; PBULL, 2.8% to 6.1%). Conclusion: A noticeable improvement by ssGBLUP was observed in this study. Findings of differential responses to ssGBLUP by various bulls could assist in better selection decision making as well. We, therefore, suggest that ssGBLUP could be used for GS in Hanwoo proven-bull evaluation program.

Efficiency and Productivity of Seven Large-sized Shipbuilding Firms in Korea (국내 대형조선업계의 효율성 및 생산성 분석)

  • Park, Seok-Ho
    • Journal of Korea Port Economic Association
    • /
    • v.26 no.4
    • /
    • pp.188-206
    • /
    • 2010
  • Data Envelopment Analysis(DEA) is an operations research-based method for measuring the performance efficiency of decision units that are characterized by multiple inputs and outputs. DEA has been applied successfully as a performance evaluation tool in many fields. However, it has not been extensively applied in the shipbuilding industry. This paper applied the input-oriented DEA model, and Malmquist indices to the 7 shipbuilding firms to measure the efficiency and productivity changes during the period of 2004 to 2009. The Malmquist indices will be decomposed into three components such as pure efficiency change, scale efficiency change, and technical change. The empirical results show the following findings. First, the DEA findings indicate that main source of inefficiency is scale rather than pure technical. Second, the Malmquist indices show that an overall decrease in productivity.

Game Theoretic Optimization of Investment Portfolio Considering the Performance of Information Security Countermeasure (정보보호 대책의 성능을 고려한 투자 포트폴리오의 게임 이론적 최적화)

  • Lee, Sang-Hoon;Kim, Tae-Sung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.37-50
    • /
    • 2020
  • Information security has become an important issue in the world. Various information and communication technologies, such as the Internet of Things, big data, cloud, and artificial intelligence, are developing, and the need for information security is increasing. Although the necessity of information security is expanding according to the development of information and communication technology, interest in information security investment is insufficient. In general, measuring the effect of information security investment is difficult, so appropriate investment is not being practice, and organizations are decreasing their information security investment. In addition, since the types and specification of information security measures are diverse, it is difficult to compare and evaluate the information security countermeasures objectively, and there is a lack of decision-making methods about information security investment. To develop the organization, policies and decisions related to information security are essential, and measuring the effect of information security investment is necessary. Therefore, this study proposes a method of constructing an investment portfolio for information security measures using game theory and derives an optimal defence probability. Using the two-person game model, the information security manager and the attacker are assumed to be the game players, and the information security countermeasures and information security threats are assumed as the strategy of the players, respectively. A zero-sum game that the sum of the players' payoffs is zero is assumed, and we derive a solution of a mixed strategy game in which a strategy is selected according to probability distribution among strategies. In the real world, there are various types of information security threats exist, so multiple information security measures should be considered to maintain the appropriate information security level of information systems. We assume that the defence ratio of the information security countermeasures is known, and we derive the optimal solution of the mixed strategy game using linear programming. The contributions of this study are as follows. First, we conduct analysis using real performance data of information security measures. Information security managers of organizations can use the methodology suggested in this study to make practical decisions when establishing investment portfolio for information security countermeasures. Second, the investment weight of information security countermeasures is derived. Since we derive the weight of each information security measure, not just whether or not information security measures have been invested, it is easy to construct an information security investment portfolio in a situation where investment decisions need to be made in consideration of a number of information security countermeasures. Finally, it is possible to find the optimal defence probability after constructing an investment portfolio of information security countermeasures. The information security managers of organizations can measure the specific investment effect by drawing out information security countermeasures that fit the organization's information security investment budget. Also, numerical examples are presented and computational results are analyzed. Based on the performance of various information security countermeasures: Firewall, IPS, and Antivirus, data related to information security measures are collected to construct a portfolio of information security countermeasures. The defence ratio of the information security countermeasures is created using a uniform distribution, and a coverage of performance is derived based on the report of each information security countermeasure. According to numerical examples that considered Firewall, IPS, and Antivirus as information security countermeasures, the investment weights of Firewall, IPS, and Antivirus are optimized to 60.74%, 39.26%, and 0%, respectively. The result shows that the defence probability of the organization is maximized to 83.87%. When the methodology and examples of this study are used in practice, information security managers can consider various types of information security measures, and the appropriate investment level of each measure can be reflected in the organization's budget.

Synthetic Data Generation with Unity 3D and Unreal Engine for Construction Hazard Scenarios: A Comparative Analysis

  • Aqsa Sabir;Rahat Hussain;Akeem Pedro;Mehrtash Soltani;Dongmin Lee;Chansik Park;Jae- Ho Pyeon
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1286-1288
    • /
    • 2024
  • The construction industry, known for its inherent risks and multiple hazards, necessitates effective solutions for hazard identification and mitigation [1]. To address this need, the implementation of machine learning models specializing in object detection has become increasingly important because this technological approach plays a crucial role in augmenting worker safety by proactively recognizing potential dangers on construction sites [2], [3]. However, the challenge in training these models lies in obtaining accurately labeled datasets, as conventional methods require labor-intensive labeling or costly measurements [4]. To circumvent these challenges, synthetic data generation (SDG) has emerged as a key method for creating realistic and diverse training scenarios [5], [6]. The paper reviews the evolution of synthetic data generation tools, highlighting the shift from earlier solutions like Synthpop and Data Synthesizer to advanced game engines[7]. Among the various gaming platforms, Unity 3D and Unreal Engine stand out due to their advanced capabilities in replicating realistic construction hazard environments [8], [9]. Comparing Unity 3D and Unreal Engine is crucial for evaluating their effectiveness in SDG, aiding developers in selecting the appropriate platform for their needs. For this purpose, this paper conducts a comparative analysis of both engines assessing their ability to create high-fidelity interactive environments. To thoroughly evaluate the suitability of these engines for generating synthetic data in construction site simulations, the focus relies on graphical realism, developer-friendliness, and user interaction capabilities. This evaluation considers these key aspects as they are essential for replicating realistic construction sites, ensuring both high visual fidelity and ease of use for developers. Firstly, graphical realism is crucial for training ML models to recognize the nuanced nature of construction environments. In this aspect, Unreal Engine stands out with its superior graphics quality compared to Unity 3D which typically considered to have less graphical prowess [10]. Secondly, developer-friendliness is vital for those generating synthetic data. Research indicates that Unity 3D is praised for its user-friendly interface and the use of C# scripting, which is widely used in educational settings, making it a popular choice for those new to game development or synthetic data generation. Whereas Unreal Engine, while offering powerful capabilities in terms of realistic graphics, is often viewed as more complex due to its use of C++ scripting and the blueprint system. While the blueprint system is a visual scripting tool that does not require traditional coding, it can be intricate and may present a steeper learning curve, especially for those without prior experience in game development [11]. Lastly, regarding user interaction capabilities, Unity 3D is known for its intuitive interface and versatility, particularly in VR/AR development for various skill levels. In contrast, Unreal Engine, with its advanced graphics and blueprint scripting, is better suited for creating high-end, immersive experiences [12]. Based on current insights, this comparative analysis underscores the user-friendly interface and adaptability of Unity 3D, featuring a built-in perception package that facilitates automatic labeling for SDG [13]. This functionality enhances accessibility and simplifies the SDG process for users. Conversely, Unreal Engine is distinguished by its advanced graphics and realistic rendering capabilities. It offers plugins like EasySynth (which does not provide automatic labeling) and NDDS for SDG [14], [15]. The development complexity associated with Unreal Engine presents challenges for novice users, whereas the more approachable platform of Unity 3D is advantageous for beginners. This research provides an in-depth review of the latest advancements in SDG, shedding light on potential future research and development directions. The study concludes that the integration of such game engines in ML model training markedly enhances hazard recognition and decision-making skills among construction professionals, thereby significantly advancing data acquisition for machine learning in construction safety monitoring.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

DEVELOPMENT OF STATEWIDE TRUCK TRAFFIC FORECASTING METHOD BY USING LIMITED O-D SURVEY DATA (한정된 O-D조사자료를 이용한 주 전체의 트럭교통예측방법 개발)

  • 박만배
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.101-113
    • /
    • 1995
  • The objective of this research is to test the feasibility of developing a statewide truck traffic forecasting methodology for Wisconsin by using Origin-Destination surveys, traffic counts, classification counts, and other data that are routinely collected by the Wisconsin Department of Transportation (WisDOT). Development of a feasible model will permit estimation of future truck traffic for every major link in the network. This will provide the basis for improved estimation of future pavement deterioration. Pavement damage rises exponentially as axle weight increases, and trucks are responsible for most of the traffic-induced damage to pavement. Consequently, forecasts of truck traffic are critical to pavement management systems. The pavement Management Decision Supporting System (PMDSS) prepared by WisDOT in May 1990 combines pavement inventory and performance data with a knowledge base consisting of rules for evaluation, problem identification and rehabilitation recommendation. Without a r.easonable truck traffic forecasting methodology, PMDSS is not able to project pavement performance trends in order to make assessment and recommendations in the future years. However, none of WisDOT's existing forecasting methodologies has been designed specifically for predicting truck movements on a statewide highway network. For this research, the Origin-Destination survey data avaiiable from WisDOT, including two stateline areas, one county, and five cities, are analyzed and the zone-to'||'&'||'not;zone truck trip tables are developed. The resulting Origin-Destination Trip Length Frequency (00 TLF) distributions by trip type are applied to the Gravity Model (GM) for comparison with comparable TLFs from the GM. The gravity model is calibrated to obtain friction factor curves for the three trip types, Internal-Internal (I-I), Internal-External (I-E), and External-External (E-E). ~oth "macro-scale" calibration and "micro-scale" calibration are performed. The comparison of the statewide GM TLF with the 00 TLF for the macro-scale calibration does not provide suitable results because the available 00 survey data do not represent an unbiased sample of statewide truck trips. For the "micro-scale" calibration, "partial" GM trip tables that correspond to the 00 survey trip tables are extracted from the full statewide GM trip table. These "partial" GM trip tables are then merged and a partial GM TLF is created. The GM friction factor curves are adjusted until the partial GM TLF matches the 00 TLF. Three friction factor curves, one for each trip type, resulting from the micro-scale calibration produce a reasonable GM truck trip model. A key methodological issue for GM. calibration involves the use of multiple friction factor curves versus a single friction factor curve for each trip type in order to estimate truck trips with reasonable accuracy. A single friction factor curve for each of the three trip types was found to reproduce the 00 TLFs from the calibration data base. Given the very limited trip generation data available for this research, additional refinement of the gravity model using multiple mction factor curves for each trip type was not warranted. In the traditional urban transportation planning studies, the zonal trip productions and attractions and region-wide OD TLFs are available. However, for this research, the information available for the development .of the GM model is limited to Ground Counts (GC) and a limited set ofOD TLFs. The GM is calibrated using the limited OD data, but the OD data are not adequate to obtain good estimates of truck trip productions and attractions .. Consequently, zonal productions and attractions are estimated using zonal population as a first approximation. Then, Selected Link based (SELINK) analyses are used to adjust the productions and attractions and possibly recalibrate the GM. The SELINK adjustment process involves identifying the origins and destinations of all truck trips that are assigned to a specified "selected link" as the result of a standard traffic assignment. A link adjustment factor is computed as the ratio of the actual volume for the link (ground count) to the total assigned volume. This link adjustment factor is then applied to all of the origin and destination zones of the trips using that "selected link". Selected link based analyses are conducted by using both 16 selected links and 32 selected links. The result of SELINK analysis by u~ing 32 selected links provides the least %RMSE in the screenline volume analysis. In addition, the stability of the GM truck estimating model is preserved by using 32 selected links with three SELINK adjustments, that is, the GM remains calibrated despite substantial changes in the input productions and attractions. The coverage of zones provided by 32 selected links is satisfactory. Increasing the number of repetitions beyond four is not reasonable because the stability of GM model in reproducing the OD TLF reaches its limits. The total volume of truck traffic captured by 32 selected links is 107% of total trip productions. But more importantly, ~ELINK adjustment factors for all of the zones can be computed. Evaluation of the travel demand model resulting from the SELINK adjustments is conducted by using screenline volume analysis, functional class and route specific volume analysis, area specific volume analysis, production and attraction analysis, and Vehicle Miles of Travel (VMT) analysis. Screenline volume analysis by using four screenlines with 28 check points are used for evaluation of the adequacy of the overall model. The total trucks crossing the screenlines are compared to the ground count totals. L V/GC ratios of 0.958 by using 32 selected links and 1.001 by using 16 selected links are obtained. The %RM:SE for the four screenlines is inversely proportional to the average ground count totals by screenline .. The magnitude of %RM:SE for the four screenlines resulting from the fourth and last GM run by using 32 and 16 selected links is 22% and 31 % respectively. These results are similar to the overall %RMSE achieved for the 32 and 16 selected links themselves of 19% and 33% respectively. This implies that the SELINICanalysis results are reasonable for all sections of the state.Functional class and route specific volume analysis is possible by using the available 154 classification count check points. The truck traffic crossing the Interstate highways (ISH) with 37 check points, the US highways (USH) with 50 check points, and the State highways (STH) with 67 check points is compared to the actual ground count totals. The magnitude of the overall link volume to ground count ratio by route does not provide any specific pattern of over or underestimate. However, the %R11SE for the ISH shows the least value while that for the STH shows the largest value. This pattern is consistent with the screenline analysis and the overall relationship between %RMSE and ground count volume groups. Area specific volume analysis provides another broad statewide measure of the performance of the overall model. The truck traffic in the North area with 26 check points, the West area with 36 check points, the East area with 29 check points, and the South area with 64 check points are compared to the actual ground count totals. The four areas show similar results. No specific patterns in the L V/GC ratio by area are found. In addition, the %RMSE is computed for each of the four areas. The %RMSEs for the North, West, East, and South areas are 92%, 49%, 27%, and 35% respectively, whereas, the average ground counts are 481, 1383, 1532, and 3154 respectively. As for the screenline and volume range analyses, the %RMSE is inversely related to average link volume. 'The SELINK adjustments of productions and attractions resulted in a very substantial reduction in the total in-state zonal productions and attractions. The initial in-state zonal trip generation model can now be revised with a new trip production's trip rate (total adjusted productions/total population) and a new trip attraction's trip rate. Revised zonal production and attraction adjustment factors can then be developed that only reflect the impact of the SELINK adjustments that cause mcreases or , decreases from the revised zonal estimate of productions and attractions. Analysis of the revised production adjustment factors is conducted by plotting the factors on the state map. The east area of the state including the counties of Brown, Outagamie, Shawano, Wmnebago, Fond du Lac, Marathon shows comparatively large values of the revised adjustment factors. Overall, both small and large values of the revised adjustment factors are scattered around Wisconsin. This suggests that more independent variables beyond just 226; population are needed for the development of the heavy truck trip generation model. More independent variables including zonal employment data (office employees and manufacturing employees) by industry type, zonal private trucks 226; owned and zonal income data which are not available currently should be considered. A plot of frequency distribution of the in-state zones as a function of the revised production and attraction adjustment factors shows the overall " adjustment resulting from the SELINK analysis process. Overall, the revised SELINK adjustments show that the productions for many zones are reduced by, a factor of 0.5 to 0.8 while the productions for ~ relatively few zones are increased by factors from 1.1 to 4 with most of the factors in the 3.0 range. No obvious explanation for the frequency distribution could be found. The revised SELINK adjustments overall appear to be reasonable. The heavy truck VMT analysis is conducted by comparing the 1990 heavy truck VMT that is forecasted by the GM truck forecasting model, 2.975 billions, with the WisDOT computed data. This gives an estimate that is 18.3% less than the WisDOT computation of 3.642 billions of VMT. The WisDOT estimates are based on the sampling the link volumes for USH, 8TH, and CTH. This implies potential error in sampling the average link volume. The WisDOT estimate of heavy truck VMT cannot be tabulated by the three trip types, I-I, I-E ('||'&'||'pound;-I), and E-E. In contrast, the GM forecasting model shows that the proportion ofE-E VMT out of total VMT is 21.24%. In addition, tabulation of heavy truck VMT by route functional class shows that the proportion of truck traffic traversing the freeways and expressways is 76.5%. Only 14.1% of total freeway truck traffic is I-I trips, while 80% of total collector truck traffic is I-I trips. This implies that freeways are traversed mainly by I-E and E-E truck traffic while collectors are used mainly by I-I truck traffic. Other tabulations such as average heavy truck speed by trip type, average travel distance by trip type and the VMT distribution by trip type, route functional class and travel speed are useful information for highway planners to understand the characteristics of statewide heavy truck trip patternS. Heavy truck volumes for the target year 2010 are forecasted by using the GM truck forecasting model. Four scenarios are used. Fo~ better forecasting, ground count- based segment adjustment factors are developed and applied. ISH 90 '||'&'||' 94 and USH 41 are used as example routes. The forecasting results by using the ground count-based segment adjustment factors are satisfactory for long range planning purposes, but additional ground counts would be useful for USH 41. Sensitivity analysis provides estimates of the impacts of the alternative growth rates including information about changes in the trip types using key routes. The network'||'&'||'not;based GMcan easily model scenarios with different rates of growth in rural versus . . urban areas, small versus large cities, and in-state zones versus external stations. cities, and in-state zones versus external stations.

  • PDF