• Title/Summary/Keyword: Research Information Systems

Search Result 12,210, Processing Time 0.057 seconds

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

A Study on the Acceptance Factors of the Capital Market Sentiment Index (자본시장 심리지수의 수용요인에 관한 연구)

  • Kim, Suk-Hwan;Kang, Hyoung-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.1-36
    • /
    • 2020
  • This study is to reveal the acceptance factors of the Market Sentiment Index (MSI) created by reflecting the investor sentiment extracted by processing unstructured big data. The research model was established by exploring exogenous variables based on the rational behavior theory and applying the Technology Acceptance Model (TAM). The acceptance of MSI provided to investors in the stock market was found to be influenced by the exogenous variables presented in this study. The results of causal analysis are as follows. First, self-efficacy, investment opportunities, Innovativeness, and perceived cost significantly affect perceived ease of use. Second, Diversity of services and perceived benefits have a statistically significant impact on perceived usefulness. Third, Perceived ease of use and perceived usefulness have a statistically significant effect on attitude to use. Fourth, Attitude to use statistically significantly influences the intention to use, and the investment opportunities as an independent variable affects the intention to use. Fifth, the intention to use statistically significantly affects the final dependent variable, the intention to use continuously. The mediating effect between the independent and dependent variables of the research model is as follows. First, The indirect effect on the causal route from diversity of services to continuous use intention was 0.1491, which was statistically significant at the significance level of 1%. Second, The indirect effect on the causal route from perceived benefit to continuous use intention was 0.1281, which was statistically significant at the significance level of 1%. The results of the multi-group analysis are as follows. First, for groups with and without stock investment experience, multi-group analysis was not possible because the measurement uniformity between the two groups was not secured. Second, the analysis result of the difference in the effect of independent variables of male and female groups on the intention to use continuously, where measurement uniformity was secured between the two groups, In the causal route from usage attitude to usage intention, women are higher than men. And in the causal route from use intention to continuous use intention, males were very high and showed statistically significant difference at significance level 5%.

The Brand Personality Effect: Communicating Brand Personality on Twitter and its Influence on Online Community Engagement (브랜드 개성 효과: 트위터 상의 브랜드 개성 전달이 온라인 커뮤니티 참여에 미치는 영향)

  • Cruz, Ruth Angelie B.;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.67-101
    • /
    • 2014
  • The use of new technology greatly shapes the marketing strategies used by companies to engage their consumers. Among these new technologies, social media is used to reach out to the organization's audience online. One of the most popular social media channels to date is the microblogging platform Twitter. With 500 million tweets sent on average daily, the microblogging platform is definitely a rich source of data for researchers, and a lucrative marketing medium for companies. Nonetheless, one of the challenges for companies in developing an effective Twitter campaign is the limited theoretical and empirical evidence on the proper organizational usage of Twitter despite its potential advantages for a firm's external communications. The current study aims to provide empirical evidence on how firms can utilize Twitter effectively in their marketing communications using the association between brand personality and brand engagement that several branding researchers propose. The study extends Aaker's previous empirical work on brand personality by applying the Brand Personality Scale to explore whether Twitter brand communities convey distinctive brand personalities online and its influence on the communities' level or intensity of consumer engagement and sentiment quality. Moreover, the moderating effect of the product involvement construct in consumer engagement is also measured. By collecting data for a period of eight weeks using the publicly available Twitter application programming interface (API) from 23 accounts of Twitter-verified business-to-consumer (B2C) brands, we analyze the validity of the paper's hypothesis by using computerized content analysis and opinion mining. The study is the first to compare Twitter marketing across organizations using the brand personality concept. It demonstrates a potential basis for Twitter strategies and discusses the benefits of these strategies, thus providing a framework of analysis for Twitter practice and strategic direction for companies developing their use of Twitter to communicate with their followers on this social media platform. This study has four specific research objectives. The first objective is to examine the applicability of brand personality dimensions used in marketing research to online brand communities on Twitter. The second is to establish a connection between the congruence of offline and online brand personalities in building a successful social media brand community. Third, we test the moderating effect of product involvement in the effect of brand personality on brand community engagement. Lastly, we investigate the sentiment quality of consumer messages to the firms that succeed in communicating their brands' personalities on Twitter.

Exploring the 4th Industrial Revolution Technology from the Landscape Industry Perspective (조경산업 관점에서 4차 산업혁명 기술의 탐색)

  • Choi, Ja-Ho;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.2
    • /
    • pp.59-75
    • /
    • 2019
  • This study was carried out to explore the 4th Industrial Revolution technology from the perspective of the landscape industry to provide the basic data necessary to increase the virtuous circle value. The 4th Industrial Revolution, the characteristics of the landscape industry and urban regeneration were considered and the methodology was established and studied including the technical classification system suitable for systematic research, which was selected as a framework. First, the 4th Industrial Revolution technology based on digital data was selected, which could be utilized to increase the value of the virtuous circle for the landscape industry. From 'Element Technology Level', and 'Core Technology' such as the Internet of Things, Cloud Computing, Big Data, Artificial Intelligence, Robot, 'Peripheral Technology', Virtual or Augmented Reality, Drones, 3D 4D Printing, and 3D Scanning were highlighted as the 4th Industrial Revolution technology. It has been shown that it is possible to increase the value of the virtuous circle when applied at the 'Trend Level', in particular to the landscape industry. The 'System Level' was analyzed as a general-purpose technology, and based on the platform, the level of element technology(computers, and smart devices) was systematically interconnected, and illuminated with the 4th Industrial Revolution technology based on digital data. The application of the 'Trend Level' specific to the landscape industry has been shown to be an effective technology for increasing the virtuous circle values. It is possible to realize all synergistic effects and implementation of the proposed method at the trend level applying the element technology level. Smart gardens, smart parks, etc. have been analyzed to the level they should pursue. It was judged that Smart City, Smart Home, Smart Farm, and Precision Agriculture, Smart Tourism, and Smart Health Care could be highly linked through the collaboration among technologies in adjacent areas at the Trend Level. Additionally, various utilization measures of related technology applied at the Trend Level were highlighted in the process of urban regeneration, public service space creation, maintenance, and public service. In other words, with the realization of ubiquitous computing, Hyper-Connectivity, Hyper-Reality, Hyper-Intelligence, and Hyper-Convergence were proposed, reflecting the basic characteristics of digital technology in the landscape industry can be achieved. It was analyzed that the landscaping industry was effectively accommodating and coordinating with the needs of new characters, education and consulting, as well as existing tasks, even when participating in urban regeneration projects. In particular, it has been shown that the overall landscapig area is effective in increasing the virtuous circle value when it systems the related technology at the trend level by linking maintenance with strategic bridgehead. This is because the industrial structure is effective in distributing data and information produced from various channels. Subsequent research, such as demonstrating the fusion of the 4th Industrial Revolution technology based on the use of digital data in creation, maintenance, and service of actual landscape space is necessary.

A Study on Smart Accuracy Control System based on Augmented Reality and Portable Measurement Device for Shipbuilding (조선소 블록 정도관리를 위한 경량화 측정 장비 및 증강현실 기반의 스마트 정도관리 시스템 개발)

  • Nam, Byeong-Wook;Lee, Kyung-Ho;Lee, Won-Hyuk;Lee, Jae-Duck;Hwang, Ho-Jin
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.1
    • /
    • pp.65-73
    • /
    • 2019
  • In order to increase the production efficiency of the ship and shorten the production cycle, it is important to evaluate the accuracy of the ship components efficiently during the drying cycle. The accuracy control of the block is important for shortening the ship process, reducing the cost, and improving the accuracy of the ship. Some systems have been developed and used mainly in large shipyards, but in some cases, they are measured and managed using conventional measuring instruments such as tape measure and beam, optical instruments as optical equipment, In order to perform accuracy control, these tools and equipment as well as equipment for recording measurement data and paper drawings for measuring the measurement position are inevitably combined. The measured results are managed by the accuracy control system through manual input or recording device. In this case, the measurement result is influenced by the work environment and the skill level of the worker. Also, in the measurement result management side, there are a human error about the lack of the measurement result creation, the lack of the management sheet management, And costs are lost in terms of efficiency due to consumption. The purpose of this study is to improve the working environment in the existing accuracy management process by using the augmented reality technology to visualize the measurement information on the actual block and to obtain the measurement information And a smart management system based on augmented reality that can effectively manage the accuracy management data through interworking with measurement equipment. We confirmed the applicability of the proposed system to the accuracy control through the prototype implementation.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

Detection of flash drought using evaporative stress index in South Korea (증발스트레스지수를 활용한 국내 돌발가뭄 감지)

  • Lee, Hee-Jin;Nam, Won-Ho;Yoon, Dong-Hyun;Mark, D. Svoboda;Brian, D. Wardlow
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.8
    • /
    • pp.577-587
    • /
    • 2021
  • Drought is generally considered to be a natural disaster caused by accumulated water shortages over a long period of time, taking months or years and slowly occurring. However, climate change has led to rapid changes in weather and environmental factors that directly affect agriculture, and extreme weather conditions have led to an increase in the frequency of rapidly developing droughts within weeks to months. This phenomenon is defined as 'Flash Drought', which is caused by an increase in surface temperature over a relatively short period of time and abnormally low and rapidly decreasing soil moisture. The detection and analysis of flash drought is essential because it has a significant impact on agriculture and natural ecosystems, and its impacts are associated with agricultural drought impacts. In South Korea, there is no clear definition of flash drought, so the purpose of this study is to identify and analyze its characteristics. In this study, flash drought detection condition was presented based on the satellite-derived drought index Evaporative Stress Index (ESI) from 2014 to 2018. ESI is used as an early warning indicator for rapidly-occurring flash drought a short period of time due to its similar relationship with reduced soil moisture content, lack of precipitation, increased evaporative demand due to low humidity, high temperature, and strong winds. The flash droughts were analyzed using hydrometeorological characteristics by comparing Standardized Precipitation Index (SPI), soil moisture, maximum temperature, relative humidity, wind speed, and precipitation. The correlation was analyzed based on the 8 weeks prior to the occurrence of the flash drought, and in most cases, a high correlation of 0.8(-0.8) or higher(lower) was expressed for ESI and SPI, soil moisture, and maximum temperature.

Analysis of the relationship between interest rate spreads and stock returns by industry (금리 스프레드와 산업별 주식 수익률 관계 분석)

  • Kim, Kyuhyeong;Park, Jinsoo;Suh, Jihae
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.105-117
    • /
    • 2022
  • This study analyzes the effects between stock returns and interest rate spread, difference between long-term and short-term interest rate through the polynomial linear regression analysis. The existing research concentrated on the business forecast through the interest rate spread focusing on the US market. The previous studies verified the interest rate spread based on the leading indicators of business forecast by moderating the period of long-term/short-term interest rates and analyzing the degree of leading. After the 7th reform of composite indices of business indicators in Korea of 2006, the interest rate spread was included in the items of composing the business leading indicators, which is utilized till today. Nevertheless, there are a few research on stock returns of each industry and interest rate spread in domestic stock market. Therefore, this study analyzed the stock returns of each industry and interest rate spread targeting Korean stock market. This study selected the long-term/short-term interest rates with high causality through the regression analysis, and then understood the correlations with each leading period and industry. To overcome the limitation of the simple linear regression analysis, polynomial linear regression analysis is used, which raised explanatory power. As a result, the high causality was verified when using differences between returns of corporate bond(AA-) without guarantee for three years by leading six months and call rate returns as interest rate spread. In addition, analyzing the stock returns of each industry, the relation between the relevant interest rate spread and returns of the automobile industry was the closest. This study is significant in the aspect of verifying the causality of interest rate spread, business forecast, and stock returns in Korea. Even though it could be limited to forecast the stock price by using only the interest rate spread, it would be working as a strong factor when it is properly utilized with other various factors.

The Effect of Domain Specificity on the Performance of Domain-Specific Pre-Trained Language Models (도메인 특수성이 도메인 특화 사전학습 언어모델의 성능에 미치는 영향)

  • Han, Minah;Kim, Younha;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.251-273
    • /
    • 2022
  • Recently, research on applying text analysis to deep learning has steadily continued. In particular, researches have been actively conducted to understand the meaning of words and perform tasks such as summarization and sentiment classification through a pre-trained language model that learns large datasets. However, existing pre-trained language models show limitations in that they do not understand specific domains well. Therefore, in recent years, the flow of research has shifted toward creating a language model specialized for a particular domain. Domain-specific pre-trained language models allow the model to understand the knowledge of a particular domain better and reveal performance improvements on various tasks in the field. However, domain-specific further pre-training is expensive to acquire corpus data of the target domain. Furthermore, many cases have reported that performance improvement after further pre-training is insignificant in some domains. As such, it is difficult to decide to develop a domain-specific pre-trained language model, while it is not clear whether the performance will be improved dramatically. In this paper, we present a way to proactively check the expected performance improvement by further pre-training in a domain before actually performing further pre-training. Specifically, after selecting three domains, we measured the increase in classification accuracy through further pre-training in each domain. We also developed and presented new indicators to estimate the specificity of the domain based on the normalized frequency of the keywords used in each domain. Finally, we conducted classification using a pre-trained language model and a domain-specific pre-trained language model of three domains. As a result, we confirmed that the higher the domain specificity index, the higher the performance improvement through further pre-training.

End to End Model and Delay Performance for V2X in 5G (5G에서 V2X를 위한 End to End 모델 및 지연 성능 평가)

  • Bae, Kyoung Yul;Lee, Hong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.107-118
    • /
    • 2016
  • The advent of 5G mobile communications, which is expected in 2020, will provide many services such as Internet of Things (IoT) and vehicle-to-infra/vehicle/nomadic (V2X) communication. There are many requirements to realizing these services: reduced latency, high data rate and reliability, and real-time service. In particular, a high level of reliability and delay sensitivity with an increased data rate are very important for M2M, IoT, and Factory 4.0. Around the world, 5G standardization organizations have considered these services and grouped them to finally derive the technical requirements and service scenarios. The first scenario is broadcast services that use a high data rate for multiple cases of sporting events or emergencies. The second scenario is as support for e-Health, car reliability, etc.; the third scenario is related to VR games with delay sensitivity and real-time techniques. Recently, these groups have been forming agreements on the requirements for such scenarios and the target level. Various techniques are being studied to satisfy such requirements and are being discussed in the context of software-defined networking (SDN) as the next-generation network architecture. SDN is being used to standardize ONF and basically refers to a structure that separates signals for the control plane from the packets for the data plane. One of the best examples for low latency and high reliability is an intelligent traffic system (ITS) using V2X. Because a car passes a small cell of the 5G network very rapidly, the messages to be delivered in the event of an emergency have to be transported in a very short time. This is a typical example requiring high delay sensitivity. 5G has to support a high reliability and delay sensitivity requirements for V2X in the field of traffic control. For these reasons, V2X is a major application of critical delay. V2X (vehicle-to-infra/vehicle/nomadic) represents all types of communication methods applicable to road and vehicles. It refers to a connected or networked vehicle. V2X can be divided into three kinds of communications. First is the communication between a vehicle and infrastructure (vehicle-to-infrastructure; V2I). Second is the communication between a vehicle and another vehicle (vehicle-to-vehicle; V2V). Third is the communication between a vehicle and mobile equipment (vehicle-to-nomadic devices; V2N). This will be added in the future in various fields. Because the SDN structure is under consideration as the next-generation network architecture, the SDN architecture is significant. However, the centralized architecture of SDN can be considered as an unfavorable structure for delay-sensitive services because a centralized architecture is needed to communicate with many nodes and provide processing power. Therefore, in the case of emergency V2X communications, delay-related control functions require a tree supporting structure. For such a scenario, the architecture of the network processing the vehicle information is a major variable affecting delay. Because it is difficult to meet the desired level of delay sensitivity with a typical fully centralized SDN structure, research on the optimal size of an SDN for processing information is needed. This study examined the SDN architecture considering the V2X emergency delay requirements of a 5G network in the worst-case scenario and performed a system-level simulation on the speed of the car, radius, and cell tier to derive a range of cells for information transfer in SDN network. In the simulation, because 5G provides a sufficiently high data rate, the information for neighboring vehicle support to the car was assumed to be without errors. Furthermore, the 5G small cell was assumed to have a cell radius of 50-100 m, and the maximum speed of the vehicle was considered to be 30-200 km/h in order to examine the network architecture to minimize the delay.