• 제목/요약/키워드: Research Performance

Search Result 37,361, Processing Time 0.072 seconds

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Surrogate Internet Shopping Malls: The Effects of Consumers' Perceived Risk and Product Evaluations on Country-of-Buying-Origin Image (망상대구점(网上代购店): 소비자감지풍험화산품평개대원산국형상적영향(消费者感知风险和产品评价对原产国形象的影响))

  • Lee, Hyun-Joung;Shin, So-Hyoun;Kim, Sang-Uk
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.2
    • /
    • pp.208-218
    • /
    • 2010
  • Internet has grown fast and become one of the most important retail channels now. Various types of Internet retailers, hereafter etailers, have been introduced so far and as one type of Internet shopping mall, 'surrogate Internet shopping mall' has been prosperous and attracting consumers in the domestic market. Surrogate Internet shopping mall is a unique type of etailer that globally purchases well-known brand goods that are not imported in the market, completes delivery in the favor of individual buyers, and collects fees for these specific services. The consumers, who are usually interested in purchasing high-end and unique but not eligible brands, have difficulties to purchase these items overseas directly from the retailers or brands in other countries due to worries of payment failure and no address available for their usually domestic only delivery. In Korea, both numbers of surrogate Internet shopping malls and the magnitude of sales have been growing rapidly up to more than 430 active malls and 500 billion Korean won in 2008 since the population of consumers who want this agent shopping service is also expending. This etail business concept is originated from 'surrogate-mediated purchase' and this type of shopping agent has existed in many different forms and also in wide ranges of context level for quite a long time. As marketers face their individual buyers' representatives instead of a direct contact with them in many occasions, the impact of surrogate shoppers on consumer's decision making has been enormously important and many scholars have explored various range of agent's impact on consumer's purchase decisions in marketing and psychology field. However, not much rigorous research in the Internet commerce has been conveyed yet. Moreover, since as one of the shopping agent surrogate Internet shopping malls specifically connect overseas brands or retailers to domestic consumers, one specific character of the mall's, image of surrogate buying country, where surrogate purchases are conducted in, may play an important role to form consumers' attitude and purchase intention toward products. Furthermore it also possibly affects various dimensions of perceived risk in consumer's information processing. However, though tremendous researches have been carried exploring the effects of diverse dimensions of country of origin, related studies in Internet context has been rarely executed. There have been some studies that prove the positive impact of country of origin on consumer's evaluations as one of information clues in product manufacture descriptions, yet studies detecting the relationship between country image of surrogate buying origin and product evaluations rarely undertaken regarding this specific mall type. Thus, the authors have found it well-worth investigating in this specific retail channel and explored systematic relationships among focal constructs and elaborated their different paths. The authors have proven that country image of surrogate buying origin in the mall, where surrogate malls purchase products in and brings them from for buyers, not only has a positive effect on consumers' product evaluations including attitude and purchase intention but also has a negative effect on all three dimensions of perceived risk: product-related risk, shipping-related risk, and post-purchase risk. Specifically among all the perceived risk, product-related risk which is arisen from high uncertainty of product performance is most affected (${\beta}$= -.30) by negative country image of surrogate buying origin, and also shipping-related risk (${\beta}$= -.18) and post-purchase risk (${\beta}$= -.15) get influenced in order. Its direct effects on product attitude (${\beta}$= .10) and purchase intention (${\beta}$= .14) are also secured. Each of perceived risk dimension is proven to have a negative effect on purchase intention through product attitude as a mediator (${\beta}$= -.57: product-related risk ${\rightarrow}$ product attitude; ${\beta}$= -.24: shipping-related risk ${\rightarrow}$ product attitude; ${\beta}$= -.44: post-purchase risk ${\rightarrow}$ product attitude) as well. From the additional analysis, the paths of consumers' information processing are shown to be different based on their levels of product knowledge. While novice consumers with low level of knowledge consider only perceived risk important, expert consumers with high level of knowledge take both the country image, where surrogate services are conducted in, and perceived risk seriously to build their attitudes and formulate decisions toward products more delicately and systematically, which is in line with previous studies. This study suggests several pieces of academic and practical advice. Precisely, country image of surrogate buying origin does affect on consumer's risk perceptions and behavioral consequences. Therefore a careful selection of surrogate buying origin is recommended. Furthermore, reducing consumers' risk level is required to blossom this new type of retail business whether its consumer are novices or experts. Additionally, since consumer take different paths of elaborating information based on their knowledge levels, sophisticated marketing approaches to each group of consumers are required. For novice buyers strong devices for risk mitigation are needed to induce them to form better attitudes and for experts selections of better and advanced countries as surrogate buying origins are advised while endorsement strategy for the site might work as a reliable information clue to all consumers to mitigate the barriers to purchase goods online. The authors have also explained that the study suffers from some limitations, including generalizability. In future studies, tests of and comparisons among different types of etailers with relevant constructs are recommended to broaden the findings.

Effect of Flywheel Weight on the Vibration of Diesel Engine (플라이휠 중량(重量)이 디젤 기관(機關)의 진동(振動)에 미치는 영향(影響))

  • Myung, Byung Soo;Kim, Sung Rai
    • Korean Journal of Agricultural Science
    • /
    • v.20 no.2
    • /
    • pp.167-180
    • /
    • 1993
  • Most of small size diesel engines are widely used with the same size and weight flywheel in the levels of 6.0kW and 7.5kW. This study was conducted to obtain basic data which affect the engine performance of the power tiller. The flywheel weight was considered as a major factor in this research. Basically, fuel consumption ratio, motoring loss, torque, vibration and mechanical efficiency of the engine were measured and analyzed on four levels of flywheel weight, 32.2, 29.4, 26.2 and $24.2kg_f$, respectively. Results were obtained as follows: 1. The weights of flywheel were $23.7kg_f$ from design program of JSME and $24.5kg_f$ from ASME and SAE design criteria. Therefore, the flywheel weight of $32.2kg_f$ might be reduced about $8kg_f$ in 7.5kW engine. 2. The rated outputs of 6.0kW and 7.5kW engine were actually 7.43kW and 7.85kW, respectively. When flywheel weight was reduced from $32.2kg_f$ to $24.2kg_f$, outputs were increased from 7.43kW to 7.70kW in 6.0kW engine and from 7.85kW to 8.25kW in 7.5kW engine. 3. When the flywheel weight was reduced from $32.2kg_f$ to $24.2kg_f$, fuel consumption ratio was decreased from 300.8 to 296.8g/kW-hr in 6.0kW engine and also from 313.6 to 312.8g/kW-hr in 7.5 kW engine, respectively. 4. When the flywheel weight was reduced from $32.2kg_f$ to $24.2kg_f$, mechanical efficiency of engine was increased from 76.1% to 76.8% in 6.0kW engine and also from 76.7% to 77.0% in 7.5kW engine, respectively. 5. When the flywheel weight was reduced from $32.2kg_f$ to $24.2kg_f$, vibration was decreased at X-axis and Z-axis in 6.0kW engine, however, slightly increased at Y-axis in 6.0kW engine and at all axes in 7.5kW engine. 6. When the flywheel weight was reduced from $32.2kg_f$ to $24.4kg_f$ motoring loss was decreased from 2.33kW to 1.75kW in 6.0kW engine and also from 2.46kW to 1.84kW in 7.5kW engine.

  • PDF

An Analysis of the Landscape Cognitive Characteristics of 'Gugok Streams' in the First Half of the 18th Century Based on the Comparison of China's 『Wuyi-Gugok Painting』 (중국 『무이구곡도』 3폭(幅)의 비교 분석을 통해 본 18세기 무이산 구곡계(九曲溪)의 경물 인지특성)

  • Cheng, Zhao-Xia;Rho, Jae-Hyun;Jiang, Cheng
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.37 no.3
    • /
    • pp.62-82
    • /
    • 2019
  • Taking the three Wuyi-Gugok Drawings, 『A Picture Showing the Boundary Between Mountains and Rivers: A』, 『Landscape of the Jiuqu River in the Wuyi Mountain: B』 and 『Eighteen Sceneries of Wuyi Mountain: C』, which were produced in the mid-Qing Dynasty as the research objects and after investigating the names recorded in the paintings, this paper tries to analyze the scenic spots, scene types and images in the literature survey. Also, based on the number of Scenic type and the number of Scenic name in each Gok, landscape richness(LR) and landscape similarity(LS) of the Gugok scenic spots, the cognitive characteristics of the landscape in the 18th century were carefully observed. The results are as follows. Firstly, according to the description statistics of scenic spot types in Wuyi Mountain Chronicle, there were 41 descriptions of scenery names in the three paintings, among which rock, peak and stone accounted for the majority. According to the data, the number of rocks, peaks and stones in Wuyi-Gugok landscape accounted for more than half, which reflected the characteristics of geological landscape such as Danxia landform in Wuyi-Gugok landscape. Secondly, the landscape of Gugok Stream(九曲溪) was diverse and full of images. The 1st Gok Daewangbong(大王峰) and Manjeongbong(幔亭峰), the 2nd Gok Oknyeobong(玉女峰), the 3rd Gok Sojangbong(小藏峰), the 4th Gok Daejangbong(大藏峰), the 5th Gok Daeeunbyeong(大隱屛) and Muijeongsa(武夷精舍), the 6th Gok Seonjangbong(仙掌峰) and Cheonyubong(天游峰) all had outstanding landscape in each Gok. However, the landscape features of the 7th~9th Gok were relatively low. Thirdly, according to the landscape image survey of each Gok, the image formation of Gugok cultural landscape originates from the specificity of the myths and legends related to Wuyi Mountain, and the landscape is highly well-known. Due to the specificity, the landscape recognition was very high. In particular, the 1st Gok and the 5th Gok closely related to the Taoist culture based on Muigun, the Stone Carving culture and the Boat Tour culture related to neo-confucianism culture of Zhu Xi. Fourthly, according to the analysis results of landscape similarity of 41 landscape types shown in the figure, the similarity of A and C was very high. The morphological description and the relationship of distant and near performance was very similar. Therefore, it could be judged that this was obviously influenced by one painting. As a whole, the names of the scenes depicted in the three paintings were formed at least in the first half of 18th century through a long history of inheritance, accumulated myths and legends, and the names of the scenes. The order of the scenery names in three Drawings had some differences. But among the scenery names appearing in all three Drawings, there were 21 stones, 20 rocks and 17 peaks. Stones, rocks and peaks guided the landscape of Gugok Streams in Wuyi Mountain. Fifthly, Seonjodae(仙釣臺) in A and C was described in the 4th Gok, but what deserved attention was that it was known as the scenery name of the 3rd Gok in Korean. In addition, Seungjindong(升眞洞) in the 1st Gok and Seokdangsa(石堂寺) in the 7th Gok were not described in Drawings A, B and C. This is a special point that needs to be studied in the future.

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

A Study on the Determinants of Patent Citation Relationships among Companies : MR-QAP Analysis (기업 간 특허인용 관계 결정요인에 관한 연구 : MR-QAP분석)

  • Park, Jun Hyung;Kwahk, Kee-Young;Han, Heejun;Kim, Yunjeong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.21-37
    • /
    • 2013
  • Recently, as the advent of the knowledge-based society, there are more people getting interested in the intellectual property. Especially, the ICT companies leading the high-tech industry are working hard to strive for systematic management of intellectual property. As we know, the patent information represents the intellectual capital of the company. Also now the quantitative analysis on the continuously accumulated patent information becomes possible. The analysis at various levels becomes also possible by utilizing the patent information, ranging from the patent level to the enterprise level, industrial level and country level. Through the patent information, we can identify the technology status and analyze the impact of the performance. We are also able to find out the flow of the knowledge through the network analysis. By that, we can not only identify the changes in technology, but also predict the direction of the future research. In the field using the network analysis there are two important analyses which utilize the patent citation information; citation indicator analysis utilizing the frequency of the citation and network analysis based on the citation relationships. Furthermore, this study analyzes whether there are any impacts between the size of the company and patent citation relationships. 74 S&P 500 registered companies that provide IT and communication services are selected for this study. In order to determine the relationship of patent citation between the companies, the patent citation in 2009 and 2010 is collected and sociomatrices which show the patent citation relationship between the companies are created. In addition, the companies' total assets are collected as an index of company size. The distance between companies is defined as the absolute value of the difference between the total assets. And simple differences are considered to be described as the hierarchy of the company. The QAP Correlation analysis and MR-QAP analysis is carried out by using the distance and hierarchy between companies, and also the sociomatrices that shows the patent citation in 2009 and 2010. Through the result of QAP Correlation analysis, the patent citation relationship between companies in the 2009's company's patent citation network and the 2010's company's patent citation network shows the highest correlation. In addition, positive correlation is shown in the patent citation relationships between companies and the distance between companies. This is because the patent citation relationship is increased when there is a difference of size between companies. Not only that, negative correlation is found through the analysis using the patent citation relationship between companies and the hierarchy between companies. Relatively it is indicated that there is a high evaluation about the patent of the higher tier companies influenced toward the lower tier companies. MR-QAP analysis is carried out as follow. The sociomatrix that is generated by using the year 2010 patent citation relationship is used as the dependent variable. Additionally the 2009's company's patent citation network and the distance and hierarchy networks between the companies are used as the independent variables. This study performed MR-QAP analysis to find the main factors influencing the patent citation relationship between the companies in 2010. The analysis results show that all independent variables have positively influenced the 2010's patent citation relationship between the companies. In particular, the 2009's patent citation relationship between the companies has the most significant impact on the 2010's, which means that there is consecutiveness regarding the patent citation relationships. Through the result of QAP correlation analysis and MR-QAP analysis, the patent citation relationship between companies is affected by the size of the companies. But the most significant impact is the patent citation relationships that had been done in the past. The reason why we need to maintain the patent citation relationship between companies is it might be important in the use of strategic aspect of the companies to look into relationships to share intellectual property between each other, also seen as an important auxiliary of the partner companies to cooperate with.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.