Search | Korea Science

An Analysis into the Characteristics of the High-pass Transportation Data and Information Processing Measures on Urban Roads (도시부도로에서의 하이패스 교통자료 특성분석 및 정보가공방안)

Jung, Min-Chul;Kim, Young-Chan;Kim, Dong-Hyo
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.10 no.6
- /
- pp.74-83
- /
- 2011
The high-pass transportation information system directly collects section information by using probe cars and therefore can offer more reliable information to drivers. However, because the running condition and features of probe cars and statistical processing methods affect the reliability of the information and particularly because the section travel time is greatly influenced by whether there has been delay by signals on urban roads or not, there can be much deviation among the collected individual probe data. Accordingly, researches in multilateral directions are necessary in order to enhance the credibility of the section information. Yet, the precedent studies related to high-pass information provision have been conducted on the highway sections with the feature of continuous flow, which has a limit to be applied to the urban roads with the transportational feature of an interrupted flow. Therefore, this research aims at analyzing the features of high-pass transportation data on urban roads and finding a proper processing method. When the characteristics of the high-pass data on urban roads collected from RSE were analyzed by using a time-space diagram, the collected data was proved to have a certain pattern according to the arriving cars' waiting for signals with the period of the signaling cycle of the finish node. Moreover, the number of waiting for signals and the time of waiting caused the deviation in the collected data, and it was bigger in traffic jam. The analysis result showed that it was because the increased number of waiting for signals in traffic jam caused the deviation to be offset partially. The analysis result shows that it is appropriate to use the mean of this collected data of high-pass on urban roads as its representative value to reflect the transportational features by waiting for signals, and the standard of judgment of delay and congestion needs to be changed depending on the features of signals and roads. The results of this research are expected to be the foundation stone to improve the reliability of high-pass information on urban roads.
PDF KSCI

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

Kim, Eunmi;Hong, Taeho
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.29-45
- /
- 2015
Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.
https://doi.org/10.13088/jiis.2015.21.1.29 인용 PDF KSCI

A Study on the Structure and Function of the Underground Storage Facility in Baekje (백제 지하저장시설(地下貯藏施設)의 구조와 기능에 대한 검토)

Shin, Jong-Kuk
- Korean Journal of Heritage: History & Science
- /
- v.38
- /
- pp.129-156
- /
- 2005
Increasing discovery cases of underground storage facilities made of earth, wood, or stone are being reported from the recent excavation survey of the Baekje relics. Accordingly, the purpose of this study is to examine the structure and function of the underground storage facilities of Baekje following a classification made by the type and building method as follows: plask shape, wooden box shape, and stone box shape. The plask shape storage is the most representative underground storage of Baekje that has been found in numerous relics more than 600 sets around Hangang(Han River) and Geumgang(Geum River) from the Hansung period to Sabi period in Baekje Dynasty. It is a historical artefact as a part of the unique storage culture of Baekje around Hangang and Geumgang from the 3rd to 7th Century. Considering its structure and the example of Chinese one, it might had been used for a long-term storage of grains and various other items including earth wares. The storage facility in wooden box shape and stone box shape are found mostly in the relics Of Sabi period. Thus it might had taken some functions of the storage in traditional pouch shape which had decreased after the 6th Century. In particular, the wooden box shape and stone box shape storage required enormous labor force to build owing to their structure and building method. Thus, they were considered to had been used for official purposes in province fortress and citadel artefact. The wooden box shape storage facility is classified into flat rectangular type and square type based on the structure, and into Gagu type(架構式) and Juheol type(柱穴式) based on the building method. It might had been decided according to the geography and geological feature of the place where the storage was to be built. Considering the examples of Gwanbuk-ri relics and Weolpyong-dong relics, the wooden box shape storage facility might had been used for various items depending on the needs, including foods such as fruits and essential provisions at the military base. Considering the long-term food storage, the examples in Japan, and the functional characteristics of the underground storage facility, there is a possibility that the wooden and stone box shape storage facilities had been built so as to safely store important items in case of fire. This study is only a rudimentary examination for the storage facility in Baekje. Thus further studies are to be made specifically and comprehensively on the comparison with other regions, distribution pattern, discovered relics and artefacts, and functions.
https://doi.org/10.22755/kjchs.2005.38.129 인용 PDF

Changes of Plant Growth and Nutrient Concentrations of the Drainage According to Drainage Reuse and Substrate Type in Sweet Pepper Hydroponics (파프리카 수경재배 시 배액 재사용과 배지 종류에 따른 생육 및 배액 내 이온 농도 변화)

Lim, Mi Young;Jeong, Eun Seol;Roh, Mi Young;Choi, Gyeong Lee;Kim, So Hui;Lee, Choung Keun
- Journal of Bio-Environment Control
- /
- v.31 no.4
- /
- pp.476-484
- /
- 2022
This study was conducted to investigate the effect of closed cultivation and open cultivation method and substrate type on the nutrient ion change pattern and growth of sweet pepper (Capsicum annuum L.) 'Scirocco' according to the reuse of drainage in hydroponics. The sowing, transplanting, and application of the closed and open cultivation method were carried out on August 19 and September 16, and October 21, 2021, respectively. As a result of the analysis of nutrients in the drainage, Na⁺ and Cl^- are representative ions that crops do not absorb properly, and as the growth progresses, they are accumulated in the closed method. In addition, since the content of NH₄-N in the drainage is significantly lower than that of NO₃-N, it is thought that NH₄-N is preferentially absorbed rather than NO₃-N due to the ion selectivity of sweet pepper. The growth and fruit characteristics of sweet pepper did not differ significantly between treatments according to the drainage reuse and the type of substrate. In conclusion, if you take care of poor fruiting due to the weakening of power after the middle period in hydroponic cultivation of sweet pepper according to the cultivation method of closed and open, and the substrate type of coir and rock wool, the difference between treatments is not large, so the sweet pepper can be produced by selecting the cultivation methods and substrate types suitable for the conditions of grower. However, as interest in environmental pollution has recently increased, it is judged that there is no need to worry about a decrease in quantity or quality, even if a closed cultivation method is adopted under the assumption that pathogen infection due to drainage reuse is well managed. It is expected that if coir is applied instead of rock wool, which causes a problem of disposal, it will further contribute to the reduction of environmental pollution.
https://doi.org/10.12791/KSBEC.2022.31.4.476 인용 PDF KSCI

Sensitivity analysis of the FAO Penman-Monteith reference evapotranspiration model (FAO Penman-Monteith 기준증발산식 민감도 분석)

Rim, Chang-Soo
- Journal of Korea Water Resources Association
- /
- v.56 no.4
- /
- pp.285-299
- /
- 2023
Estimating the evapotranspiration is very important factor for effective water resources management, and FAO Penman-Monteith (FAO P-M) model has been applied for reference evapotranspiration estimation by many researchers. However, because various input data are required for the application of FAO P-M model, understanding the effect of each input data on FAO P-M model is necessary. Therefore, in this study, for 56 study stations located in South Korea, the effects of 8 meteorological factors (maximum and minimum temperature, wind speed, relative humidity, solar radiation, vapor pressure deficit, net radiation, ground heat flux), energy and aerodynamic terms of FAO P-M model, and elevation on FAO P-M reference evapotranspiration (RET) estimation were analyzed. The relative sensitivity analysis was performed to determine how 10% increment of each specific independent variable affects a reference evapotranspiration under given set of condition that other independent variables are unchanged. Furthermore, to select the 5 representative stations and perform the monthly relative sensitivity analysis for those stations, 56 study stations were classified into 5 clusters using cluster analysis. The study results showed that net radiation was turned out to be the most sensitive factor in 8 meteorological factors for 56 study stations. The next most sensitive factor was relative humidity, solar radiation, maximum temperature, vapor pressure deficit and wind speed, followed by minimum temperature in order. Ground heat flux was the least sensitive factor. In case of ground surface condition, elevation showed very low positive relative sensitivity. Relativity sensitivities of energy and aerodynamic terms of FAO P-M model were 0.707 for energy term and 0.293 for aerodynamic term respectively, indicating that energy term was more contributable than aerodynamic term for reference evapotranspiration. The monthly relative sensitivities of meteorological factors showed the seasonal effects, and also the relative sensitivity of elevation showed different pattern each other among study stations. Therefore, for the application of FAO P-M model, the seasonal and regional sensitivity differences of each input variable should be considered.
https://doi.org/10.3741/JKWRA.2023.56.4.285 인용 PDF

Dynamic response of segment lining due to train-induced vibration (세그먼트 라이닝의 열차 진동하중에 대한 동적 응답특성)

Gyeong-Ju Yi;Ki-Il Song
- Journal of Korean Tunnelling and Underground Space Association
- /
- v.25 no.4
- /
- pp.305-330
- /
- 2023
Unlike NATM tunnels, Shield TBM tunnels have split linings. Therefore, the stress distribution of the lining is different even if the lining is under the same load. Representative methods for analyzing the stress generated in lining in Shield TBM tunnels include Non-joint Mode that does not consider connections and a 2-ring beam-spring model that considers ring-to-ring joints and segment connections. This study is an analysis method by Break-joint Mode. However, we do not consider the structural role of segment lining connections. The effectiveness of the modeling is verified by analyzing behavioral characteristics against vibration loads by modeling with segment connection interfaces to which vertical stiffness and shear stiffness, which are friction components, are applied. Unlike the Non-joint mode, where the greatest stress occurs on the crown for static loads such as earth pressure, the stress distribution caused by contact between segment lining and friction stiffness produced the smallest stress in the crown key segment where segment connections were concentrated. The stress distribution was clearly distinguished based on segment connections. The results of static analysis by earth pressure, etc., produced up to seven times the stress generated in Non-joint mode compared to the stress generated by Break-joint Mode. This result is consistent with the stress distribution pattern of the 2-ring beam-spring model. However, as for the stress value for the train vibration load, the stress of Break-joint Mode was greater than that of Non-joint mode. This is a different result from the static mechanics concept that a segment ring consisting of a combination of short members is integrated in the circumferential direction, resulting in a smaller stress than Non-joint mode with a relatively longer member length.
https://doi.org/10.9711/KTAJ.2023.25.4.305 인용 PDF

Distributional Characteristics of Fault Segments in Cretaceous and Tertiary Rocks from Southeastern Gyeongsang Basin (경상분지 남동부 일대의 백악기 및 제3기 암류에서 발달하는 단층분절의 분포특성)

Park, Deok-Won
- The Journal of the Petrological Society of Korea
- /
- v.27 no.3
- /
- pp.109-120
- /
- 2018
The distributional characteristics of fault segments in Cretaceous and Tertiary rocks from southeastern Gyeongsang Basin were derived. The 267 sets of fault segments showing linear type were extracted from the curved fault lines delineated on the regional geological map. First, the directional angle(${\theta}$)-length(L) chart for the whole fault segments was made. From the related chart, the general d istribution pattern of fault segments was derived. The distribution curve in the chart was divided into four sections according to its overall shape. NNE, NNW and WNW directions, corresponding to the peaks of the above sections, indicate those of the Yangsan, Ulsan and Gaeum fault systems. The fault segment population show near symmetrical distribution with respect to $N19^{\circ}E$ direction corresponding to the maximum peak. Second, the directional angle-frequency(N), mean length(Lm), total length(Lt) and density(${\rho}$) chart was made. From the related chart, whole domain of the above chart was divided into 19 domains in terms of the phases of the distribution curve. The directions corresponding to the peaks of the above domains suggest the directions of representative stresses acted on rock body. Third, the length-cumulative frequency graphs for the 18 sub-populations were made. From the related chart, the value of exponent(${\lambda}$) increase in the clockwise direction($N10{\sim}20^{\circ}E{\rightarrow}N50{\sim}60^{\circ}E$) and counterclockwise direction ($N10{\sim}20^{\circ}W{\rightarrow}N50{\sim}60^{\circ}W$). On the other hand, the width of distribution of lengths and mean length decrease. The chart for the above sub-populations having mutually different evolution characteristics, reveals a cross section of evolutionary process. Fourth, the general distribution chart for the 18 graphs was made. From the related chart, the above graphs were classified into five groups(A~E) according to the distribution area. The lengths of fault segments increase in order of group E ($N80{\sim}90^{\circ}E{\cdot}N70{\sim}80^{\circ}E{\cdot}N80{\sim}90^{\circ}W{\cdot}N50{\sim}60^{\circ}W{\cdot}N30{\sim}40^{\circ}W{\cdot}N40{\sim}50^{\circ}W$) < D ($N70{\sim}80^{\circ}W{\cdot}N60{\sim}70^{\circ}W{\cdot}N60{\sim}70^{\circ}E{\cdot}N50{\sim}60^{\circ}E{\cdot}N40{\sim}50^{\circ}E{\cdot}N0{\sim}10^{\circ}W$) < C ($N20{\sim}30^{\circ}W{\cdot}N10{\sim}20^{\circ}W$) < B ($N0{\sim}10^{\circ}E{\cdot}N30{\sim}40^{\circ}E$) < A ($N20{\sim}30^{\circ}E{\cdot}N10{\sim}20^{\circ}E$). Especially the forms of graph gradually transition from a uniform distribution to an exponential one. Lastly, the values of the six parameters for fault-segment length were divided into five groups. Among the six parameters, mean length and length of the longest fault segment decrease in the order of group III ($N10^{\circ}W{\sim}N20^{\circ}E$) > IV ($N20{\sim}60^{\circ}E$) > II ($N10{\sim}60^{\circ}W$) > I ($N60{\sim}90^{\circ}W$) > V ($N60{\sim}90^{\circ}E$). Frequency, longest length, total length, mean length and density of fault segments, belonging to group V, show the lowest values. The above order of arrangement among five groups suggests the interrelationship with the relative formation ages of fault segments.
https://doi.org/10.7854/JPSK.2018.27.3.109 인용 PDF KSCI

Future Changes in Global Terrestrial Carbon Cycle under RCP Scenarios (RCP 시나리오에 따른 미래 전지구 육상탄소순환 변화 전망)

Lee, Cheol;Boo, Kyung-On;Hong, Jinkyu;Seong, Hyunmin;Heo, Tae-kyung;Seol, Kyung-Hee;Lee, Johan;Cho, ChunHo
- Atmosphere
- /
- v.24 no.3
- /
- pp.303-315
- /
- 2014
Terrestrial ecosystem plays the important role as carbon sink in the global carbon cycle. Understanding of interactions of terrestrial carbon cycle with climate is important for better prediction of future climate change. In this paper, terrestrial carbon cycle is investigated by Hadley Centre Global Environmental Model, version 2, Carbon Cycle (HadGEM2-CC) that considers vegetation dynamics and an interactive carbon cycle with climate. The simulation for future projection is based on the three (8.5/4.5/2.6) representative concentration pathways (RCPs) from 2006 to 2100 and compared with historical land carbon uptake from 1979 to 2005. Projected changes in ecological features such as production, respiration, net ecosystem exchange and climate condition show similar pattern in three RCPs, while the response amplitude in each RCPs are different. For all RCP scenarios, temperature and precipitation increase with rising of the atmospheric $CO_2$. Such climate conditions are favorable for vegetation growth and extension, causing future increase of terrestrial carbon uptakes in all RCPs. At the end of 21st century, the global average of gross and net primary productions and respiration increase in all RCPs and terrestrial ecosystem remains as carbon sink. This enhancement of land $CO_2$ uptake is attributed by the vegetated area expansion, increasing LAI, and early onset of growing season. After mid-21st century, temperature rising leads to excessive increase of soil respiration than net primary production and thus the terrestrial carbon uptake begins to fall since that time. Regionally the NEE average value of East-Asia ($90^{\circ}E-140^{\circ}E$, $20^{\circ}N{\sim}60^{\circ}N$) area is bigger than that of the same latitude band. In the end-$21^{st}$ the NEE mean values in East-Asia area are $-2.09PgC\;yr^{-1}$, $-1.12PgC\;yr^{-1}$, $-0.47PgC\;yr^{-1}$ and zonal mean NEEs of the same latitude region are $-1.12PgC\;yr^{-1}$, $-0.55PgC\;yr^{-1}$, $-0.17PgC\;yr^{-1}$ for RCP 8.5, 4.5, 2.6.
https://doi.org/10.14191/Atmos.2014.24.3.303 인용 PDF KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
- Journal of Intelligence and Information Systems
- /
- v.19 no.1
- /
- pp.95-110
- /
- 2013
Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.
https://doi.org/10.13088/jiis.2013.19.1.095 인용 PDF KSCI

Search Result 510, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)