• 제목/요약/키워드: Information flow system

Search Result 2,137, Processing Time 0.034 seconds

An Ontology Model for Public Service Export Platform (공공 서비스 수출 플랫폼을 위한 온톨로지 모형)

  • Lee, Gang-Won;Park, Sei-Kwon;Ryu, Seung-Wan;Shin, Dong-Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.149-161
    • /
    • 2014
  • The export of domestic public services to overseas markets contains many potential obstacles, stemming from different export procedures, the target services, and socio-economic environments. In order to alleviate these problems, the business incubation platform as an open business ecosystem can be a powerful instrument to support the decisions taken by participants and stakeholders. In this paper, we propose an ontology model and its implementation processes for the business incubation platform with an open and pervasive architecture to support public service exports. For the conceptual model of platform ontology, export case studies are used for requirements analysis. The conceptual model shows the basic structure, with vocabulary and its meaning, the relationship between ontologies, and key attributes. For the implementation and test of the ontology model, the logical structure is edited using Prot$\acute{e}$g$\acute{e}$ editor. The core engine of the business incubation platform is the simulator module, where the various contexts of export businesses should be captured, defined, and shared with other modules through ontologies. It is well-known that an ontology, with which concepts and their relationships are represented using a shared vocabulary, is an efficient and effective tool for organizing meta-information to develop structural frameworks in a particular domain. The proposed model consists of five ontologies derived from a requirements survey of major stakeholders and their operational scenarios: service, requirements, environment, enterprise, and county. The service ontology contains several components that can find and categorize public services through a case analysis of the public service export. Key attributes of the service ontology are composed of categories including objective, requirements, activity, and service. The objective category, which has sub-attributes including operational body (organization) and user, acts as a reference to search and classify public services. The requirements category relates to the functional needs at a particular phase of system (service) design or operation. Sub-attributes of requirements are user, application, platform, architecture, and social overhead. The activity category represents business processes during the operation and maintenance phase. The activity category also has sub-attributes including facility, software, and project unit. The service category, with sub-attributes such as target, time, and place, acts as a reference to sort and classify the public services. The requirements ontology is derived from the basic and common components of public services and target countries. The key attributes of the requirements ontology are business, technology, and constraints. Business requirements represent the needs of processes and activities for public service export; technology represents the technological requirements for the operation of public services; and constraints represent the business law, regulations, or cultural characteristics of the target country. The environment ontology is derived from case studies of target countries for public service operation. Key attributes of the environment ontology are user, requirements, and activity. A user includes stakeholders in public services, from citizens to operators and managers; the requirements attribute represents the managerial and physical needs during operation; the activity attribute represents business processes in detail. The enterprise ontology is introduced from a previous study, and its attributes are activity, organization, strategy, marketing, and time. The country ontology is derived from the demographic and geopolitical analysis of the target country, and its key attributes are economy, social infrastructure, law, regulation, customs, population, location, and development strategies. The priority list for target services for a certain country and/or the priority list for target countries for a certain public services are generated by a matching algorithm. These lists are used as input seeds to simulate the consortium partners, and government's policies and programs. In the simulation, the environmental differences between Korea and the target country can be customized through a gap analysis and work-flow optimization process. When the process gap between Korea and the target country is too large for a single corporation to cover, a consortium is considered an alternative choice, and various alternatives are derived from the capability index of enterprises. For financial packages, a mix of various foreign aid funds can be simulated during this stage. It is expected that the proposed ontology model and the business incubation platform can be used by various participants in the public service export market. It could be especially beneficial to small and medium businesses that have relatively fewer resources and experience with public service export. We also expect that the open and pervasive service architecture in a digital business ecosystem will help stakeholders find new opportunities through information sharing and collaboration on business processes.

Soil Loss and Pollutant Load Estimation in Sacheon River Watershed using a Geographic Information System (GIS를 이용한 동해안 하천유역의 토양유실량과 오염부하량 평가 -사천천을 중심으로-)

  • Cho, Jae-Heon;Yeon, Je-Chul
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.22 no.7
    • /
    • pp.1331-1343
    • /
    • 2000
  • Through the integration of USLE and GIS, the methodology to estimate the soil loss was developed, and applicated to the Sacheon river in Gangrung. Using GIS, spatial analysis such as watershed boundary determination, flow routing. slope steepness calculation was done. Spatial information from the GIS application was given for each grid. With soil and land use map, information about soil classification and land use was given for each grid too. Based upon these data, thematic maps about the factors of USLE were made. We estimated the soil loss by overlaying the thematic maps. In this manner, we can assess the degree of soil loss for each grid using GIS. Annual average soil loss of Sacheon river watershed is 1.36 ton/ha/yr. Soil loss in forest, dry field, and paddy field is 0.15 ton/ha/yr, 27.04 ton/ha/yr, 0.78 ton/ha/yr respectively. The area of dry field, which is 4% of total area, is $2.4km^2$. But total soil loss of dry field is 6561 ton/yr, and it occupies 84.9 % of total soil loss eroded in Sacheon river watershed. Comparing with the 11.2 ton/ha/yr of an average soil loss tolerance for cropland, provision for the soil loss in dry field is necessary. Run-off and water quality of Sacheon river were measured two times in flood season: from July 24, 1998 to July 28 and from September 29 to October 1. As the run-off of the river increased, SS, TN, TP concentrations and pollutant loadings increased. SS, TN, TP loads of Sacheon river discharged during the 2 heavy rains were 21%, 39%, and 19% of the total pollutant loadings generated in the Sacheon river watershed for one year. We can see that much pollutants are discharged in short period of flood season.

  • PDF

Knowledge Management Strategy of a Franchise Business : The Case of a Paris Baguette Bakery (프랜차이즈 기업의 지식경영 전략 : 파리바게뜨 사례를 중심으로)

  • Cho, Joon-Sang;Kim, Bo-Yong
    • Journal of Distribution Science
    • /
    • v.10 no.6
    • /
    • pp.39-53
    • /
    • 2012
  • It is widely known that knowledge management plays a facilitating role that contributes to upgrading organizational performance. Knowledge management systems (KMS), especially, support the knowledge management process including the sharing, creating, and using of knowledge within a company, and maximize the value of knowledge resources within an organization. Despite this widely held belief, there are few studies that describe how companies actually develop, share, and practice their knowledge. Companies in the domestic small franchise sector, which are in the early stages in terms of knowledge management, need to improve their KMS to manage their franchisees effectively. From this perspective, this study uses a qualitative approach to explore the actual process of knowledge management implementation. This article presents a case study of PB (Paris Baguette) company, which is the first to build a KMS in the franchise industry. The study was able to confirm the following facts through the analysis of target companies. First, the chief executive's support is a critical success factor and this support can increase the participation of organization members. Second, it is important to build a process and culture that actively creates and leverages information in knowledge management activities. The organizational learning culture should be one where the creation, learning, and sharing of new knowledge is developed continuously. Third, a horizontal network organization is needed in order to make relationships within the organization more close-knit. Fourth, in order to connect the diverse processes such as knowledge acquisition, storage, and utilization of knowledge management activities, information technology (IT) capabilities are essential. Indeed, IT can be a powerful tool for improving the quality of work and maximizing the spread and use of knowledge. However, during the construction of an intranet based KMS, research is required to ensure that the most efficient system is implemented. Finally, proper evaluation and compensation are important success factors. In order to develop knowledge workers, an appropriate program of promotion and compensation should be established. Also, building members' confidence in the benefits of knowledge management should be an ongoing activity. The company developed its original KMS to achieve a flexible and proactive organization, and a new KMS to improve organizational and personal capabilities. The PB case shows that there are differences between participants perceptions and actual performance in managing knowledge; that knowledge management is not a matter of formality but a paradigm that assures the sharing of knowledge; and that IT boosts communication skills, thus creating a mutual relationship to enhance the flow of knowledge and information between people. Knowledge management for building organizational capabilities can be successful when considering its focus and ways to increase its acceptance. This study suggests guidelines for major factors that corporate executives of domestic franchises should consider to improve knowledge management and the higher operating activities that can be used.

  • PDF

A Study on the Influence of Workers' Aspiration for Academic Needs on Participation in University Education (근로자의 학업욕구 열망이 대학교육 참여에 미치는 영향에 관한 연구)

  • Lee, Ji-Hun;Mun, Bok-Hyun
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.3
    • /
    • pp.231-241
    • /
    • 2021
  • This study intended to present strategies and implications for attracting new students and customized education to university officials through research on the participation of workers' academic aspirations in university education. Thus, variables were derived by analyzing prior data, and causal settings between variables and questionnaires were developed. Subject to the survey, 331 workers interested in participating in university education were collected through interpersonal interviews. The collected data were dataized, and reliability and feasibility verification and frequency analysis were conducted. Finally, we validate the fit of the structural equation model and the causal relationship for each concept. Therefore, the results of the validation show the following implications. First, university officials should be motivated by a mentor and mentee system with experienced people who have switched to a suitable vocational group through university education. It will also be necessary to develop and disseminate programs so that they can continue to develop themselves for the future. To this end, it will be necessary to help them understand their aptitude and strengths through consultation with experts. Second, university officials should strengthen public relations so that prospective students can know the cases and information of the job transformation of the admitted workers through recommendations. It will also be necessary to develop university education programs that can self-develop, accept various ideas through "public contest", and provide accurate information about university education to workers through re-processing. Third, university officials should provide workers with a program that allows them to catch two rabbits: job transformation and self-improvement through university education. In other words, it is necessary to stimulate the motivation of workers by providing various information such as visiting advanced overseas companies, obtaining various certificates, moving between departments of blue-collar and white-collar, and transfer opportunities. Fourth, university officials should actively promote university education programs related to this by participating in university education and receiving systematic education and the flow of social environment. Finally, university officials will need to consult and promote workers so that they can self-develop when they participate in college education, and they will have to figure out what they need for self-development through demand surveys and analysis.

Analysis of Munitions Contract Work Using Process Mining (프로세스 마이닝을 이용한 군수품 계약업무 분석 : 공군 군수사 계약업무를 중심으로)

  • Joo, Yong Seon;Kim, Su Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.41-59
    • /
    • 2022
  • The timely procurement of military supplies is essential to maintain the military's operational capabilities, and contract work is the first step toward timely procurement. In addition, rapid signing of a contract enables consumers to set a leisurely delivery date and increases the possibility of budget execution, so it is essential to improve the contract process to prevent early execution of the budget and transfer or disuse. Recently, research using big data has been actively conducted in various fields, and process analysis using big data and process mining, an improvement technique, are also widely used in the private sector. However, the analysis of contract work in the military is limited to the level of individual analysis such as identifying the cause of each problem case of budget transfer and disuse contracts using the experience and fragmentary information of the person in charge. In order to improve the contract process, this study analyzed using the process mining technique with data on a total of 560 contract tasks directly contracted by the Department of Finance of the Air Force Logistics Command for about one year from November 2019. Process maps were derived by synthesizing distributed data, and process flow, execution time analysis, bottleneck analysis, and additional detailed analysis were conducted. As a result of the analysis, it was found that review/modification occurred repeatedly after request in a number of contracts. Repeated reviews/modifications have a significant impact on the delay in the number of days to complete the cost calculation, which has also been clearly revealed through bottleneck visualization. Review/modification occurs in more than 60% of the top 5 departments with many contract requests, and it usually occurs in the first half of the year when requests are concentrated, which means that a thorough review is required before requesting contracts from the required departments. In addition, the contract work of the Department of Finance was carried out in accordance with the procedures according to laws and regulations, but it was found that it was necessary to adjust the order of some tasks. This study is the first case of using process mining for the analysis of contract work in the military. Based on this, if further research is conducted to apply process mining to various tasks in the military, it is expected that the efficiency of various tasks can be derived.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

A Study on the Use of GIS-based Time Series Spatial Data for Streamflow Depletion Assessment (하천 건천화 평가를 위한 GIS 기반의 시계열 공간자료 활용에 관한 연구)

  • YOO, Jae-Hyun;KIM, Kye-Hyun;PARK, Yong-Gil;LEE, Gi-Hun;KIM, Seong-Joon;JUNG, Chung-Gil
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.50-63
    • /
    • 2018
  • The rapid urbanization had led to a distortion of natural hydrological cycle system. The change in hydrological cycle structure is causing streamflow depletion, changing the existing use tendency of water resources. To manage such phenomena, a streamflow depletion impact assessment technology to forecast depletion is required. For performing such technology, it is indispensable to build GIS-based spatial data as fundamental data, but there is a shortage of related research. Therefore, this study was conducted to use the use of GIS-based time series spatial data for streamflow depletion assessment. For this study, GIS data over decades of changes on a national scale were constructed, targeting 6 streamflow depletion impact factors (weather, soil depth, forest density, road network, groundwater usage and landuse) and the data were used as the basic data for the operation of continuous hydrologic model. Focusing on these impact factors, the causes for streamflow depletion were analyzed depending on time series. Then, using distributed continuous hydrologic model based DrySAT, annual runoff of each streamflow depletion impact factor was measured and depletion assessment was conducted. As a result, the default value of annual runoff was measured at 977.9mm under the given weather condition without considering other factors. When considering the decrease in soil depth, the increase in forest density, road development, and groundwater usage, along with the change in land use and development, and annual runoff were measured at 1,003.5mm, 942.1mm, 961.9mm, 915.5mm, and 1003.7mm, respectively. The results showed that the major causes of the streaflow depletion were lowered soil depth to decrease the infiltration volume and surface runoff thereby decreasing streamflow; the increased forest density to decrease surface runoff; the increased road network to decrease the sub-surface flow; the increased groundwater use from undiscriminated development to decrease the baseflow; increased impervious areas to increase surface runoff. Also, each standard watershed depending on the grade of depletion was indicated, based on the definition of streamflow depletion and the range of grade. Considering the weather, the decrease in soil depth, the increase in forest density, road development, and groundwater usage, and the change in land use and development, the grade of depletion were 2.1, 2.2, 2.5, 2.3, 2.8, 2.2, respectively. Among the five streamflow depletion impact factors except rainfall condition, the change in groundwater usage showed the biggest influence on depletion, followed by the change in forest density, road construction, land use, and soil depth. In conclusion, it is anticipated that a national streamflow depletion assessment system to be develop in the future would provide customized depletion management and prevention plans based on the system assessment results regarding future data changes of the six streamflow depletion impact factors and the prospect of depletion progress.

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.