Search | Korea Science

Comparing Machine Learning Classifiers for Movie WOM Opinion Mining

Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.9 no.8
- /
- pp.3169-3181
- /
- 2015
Nowadays, online word-of-mouth has become a powerful influencer to marketing and sales in business. Opinion mining and sentiment analysis is frequently adopted at market research and business analytics field for analyzing word-of-mouth content. However, there still remain several challengeable areas for 1) sentiment analysis aiming for Korean word-of-mouth content in film market, 2) availability of machine learning models only using linguistic features, 3) effect of the size of the feature set. This study took a sample of 10,000 movie reviews which had posted extremely negative/positive rating in a movie portal site, and conducted sentiment analysis with four machine learning algorithms: naïve Bayesian, decision tree, neural network, and support vector machines. We found neural network and support vector machine produced better accuracy than naïve Bayesian and decision tree on every size of the feature set. Besides, the performance of them was boosting with increasing of the feature set size.
https://doi.org/10.3837/tiis.2015.08.025 인용 PDF KSCI KPUBS HTML

Uncertainty Analysis for Parameters of Probability Distribution in Rainfall Frequency Analysis by Bayesian MCMC and Metropolis Hastings Algorithm (Bayesian MCMC 및 Metropolis Hastings 알고리즘을 이용한 강우빈도분석에서 확률분포의 매개변수에 대한 불확실성 해석)

Seo, Young-Min;Park, Ki-Bum
- Journal of Environmental Science International
- /
- v.20 no.3
- /
- pp.329-340
- /
- 2011
The probability concepts mainly used for rainfall or flood frequency analysis in water resources planning are the frequentist viewpoint that defines the probability as the limit of relative frequency, and the unknown parameters in probability model are considered as fixed constant numbers. Thus the probability is objective and the parameters have fixed values so that it is very difficult to specify probabilistically the uncertianty of these parameters. This study constructs the uncertainty evaluation model using Bayesian MCMC and Metropolis -Hastings algorithm for the uncertainty quantification of parameters of probability distribution in rainfall frequency analysis, and then from the application of Bayesian MCMC and Metropolis- Hastings algorithm, the statistical properties and uncertainty intervals of parameters of probability distribution can be quantified in the estimation of probability rainfall so that the basis for the framework configuration can be provided that can specify the uncertainty and risk in flood risk assessment and decision-making process.
https://doi.org/10.5322/JES.2011.20.3.329 인용 PDF KSCI

Application of Bayesian Statistical Analysis to Multisource Data Integration

Hong, Sa-Hyun;Moon, Wooil-M.
- Proceedings of the KSRS Conference
- /
- 2002.10a
- /
- pp.394-399
- /
- 2002
In this paper, Multisource data classification methods based on Bayesian formula are considered. For this decision fusion scheme, the individual data sources are handled separately by statistical classification algorithms and then Bayesian fusion method is applied to integrate from the available data sources. This method includes the combination of each expert decisions where the weights of the individual experts represent the reliability of the sources. The reliability measure used in the statistical approach is common to all pixels in previous work. In this experiment, the weight factors have been assigned to have different value for all pixels in order to improve the integrated classification accuracies. Although most implementations of Bayesian classification approaches assume fixed a priori probabilities, we have used adaptive a priori probabilities by iteratively calculating the local a priori probabilities so as to maximize the posteriori probabilities. The effectiveness of the proposed method is at first demonstrated on simulations with artificial and evaluated in terms of real-world data sets. As a result, we have shown that Bayesian statistical fusion scheme performs well on multispectral data classification.
PDF

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

Bae, Eun Chan;Lee, Kun Chang
- Journal of the Korea Society of Computer and Information
- /
- v.21 no.6
- /
- pp.9-19
- /
- 2016
In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.
https://doi.org/10.9708/jksci.2016.21.6.009 인용 PDF KSCI

The Effectiveness Analysis of Multistatic Sonar Network Via Detection Peformance (표적탐지성능을 이용한 다중상태 소나의 효과도 분석)

Jang, Jae-Hoon;Ku, Bon-Hwa;Hong, Woo-Young;Kim, In-Ik;Ko, Han-Seok
- Journal of the Korea Institute of Military Science and Technology
- /
- v.9 no.1 s.24
- /
- pp.24-32
- /
- 2006
This paper is to analyze the effectiveness of multistatic sonar network based on detection performance. The multistatic sonar network is a distributed detection system that places a source and multi-receivers apart. So it needs a detection technique that relates to decision rule and optimization of sonar system to improve the detection performance. For this we propose a data fusion procedure using Bayesian decision and optimal sensor arrangement by optimizing a bistatic sonar. Also, to analyze the detection performance effectively, we propose the environmental model that simulates a propagation loss and target strength suitable for multistatic sonar networks in real surroundings. The effectiveness analysis on the multistatic sonar network confirms itself as a promising tool for effective allocation of detection resources in multistatic sonar system.
PDF KSCI

A Study of Threat Evaluation using Learning Bayesian Network on Air Defense (베이지안 네트워크 학습을 이용한 방공 무기 체계에서의 위협평가 기법연구)

Choi, Bomin;Han, Myung-Mook
- Journal of the Korean Institute of Intelligent Systems
- /
- v.22 no.6
- /
- pp.715-721
- /
- 2012
A threat evaluation is the technique which decides order of priority about tracks engaging with enemy by recognizing battlefield situation and making it efficient decision making. That is, in battle situation of multiple target it makes expeditious decision making and then aims at minimizing asset's damage and maximizing attack to targets. Threat value computation used in threat evaluation is calculated by sensor data which generated in battle space. Because Battle situation is unpredictable and there are various possibilities generating potential events, the damage or loss of data can make confuse decision making. Therefore, in this paper we suggest that substantial threat value calculation using learning bayesian network which makes it adapt to the varying battle situation to gain reliable results under given incomplete data and then verify this system's performance.
https://doi.org/10.5391/JKIIS.2012.22.6.715 인용 PDF KSCI

Proposal of Maintenance Scenario and Feasibility Analysis of Bridge Inspection using Bayesian Approach (베이지안 기법을 이용한 교량 점검 타당성 분석 및 유지관리 시나리오 제안)

Lee, Jin Hyuk;Lee, Kyung Yong;Ahn, Sang Mi;Kong, Jung Sik
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.38 no.4
- /
- pp.505-516
- /
- 2018
In order to establish an efficient bridge maintenance strategy, the future performance of a bridge must be estimated by considering the current performance, which allows more rational way of decision-making in the prediction model with higher accuracy. However, personnel-based existing maintenance may result in enormous maintenance costs since it is difficult for a bridge administrator to estimate the bridge performance exactly at a targeting management level, thereby disrupting a rational decision making for bridge maintenance. Therefore, in this work, we developed a representative performance prediction model for each bridge element considering uncertainty using domestic bridge inspection data, and proposed a bayesian updating method that can apply the developed model to actual maintenance bridge with higher accuracy. Also, the feasibility analysis based on calculation of maintenance cost for monitoring maintenance scenario case is performed to propose advantages of the Bayesian-updating-driven preventive maintenance in terms of the cost efficiency in contrast to the conventional periodic maintenance.
https://doi.org/10.12652/Ksce.2018.38.4.0505 인용 PDF KSCI

Prediction of the Gold-silver Deposits from Geochemical Maps - Applications to the Bayesian Geostatistics and Decision Tree Techniques (지화학자료를 이용한 금${\cdot}$은 광산의 배태 예상지역 추정-베이시안 지구통계학과 의사나무 결정기법의 활용)

Hwang, Sang-Gi;Lee, Pyeong-Koo
- Economic and Environmental Geology
- /
- v.38 no.6 s.175
- /
- pp.663-673
- /
- 2005
This study investigates the relationship between the geochemical maps and the gold-silver deposit locations. Geochemical maps of 21 elements, which are published by KIGAM, locations of gold-silver deposits, and 1:1,000,000 scale geological map of Korea are utilized far this investigation. Pixel size of the basic geochemical maps is 250m and these data are resampled in 1km spacing for the statistical analyses. Relationship between the mine location and the geochemical data are investigated using bayesian statistics and decision tree algorithms. For the bayesian statistics, each geochemical maps are reclassified by percentile divisions which divides the data by 5, 25, 50, 75, 95, and $100\%$ data groups. Number of mine locations in these divisions are counted and the probabilities are calculated. Posterior probabilities of each pixel are calculated using the probability of 21 geochemical maps and the geological map. A prediction map of the mining locations is made by plotting the posterior probability. The input parameters for the decision tree construction are 21 geochemical elements and lithology, and the output parameters are 5 types of mines (Ag/Au, Cu, Fe, Pb/Zn, W) and absence of the mine. The locations for the absence of the mine are selected by resampling the overall area by 1 km spacing and eliminating my resampled points, which is in 750m distance from mine locations. A prediction map of each mine area is produced by applying the decision tree to every pixels. The prediction by Bayesian method is slightly better than the decision tree. However both prediction maps show reasonable match with the input mine locations. We interpret that such match indicate the rules produced by both methods are reasonable and therefore the geochemical data has strong relations with the mine locations. This implies that the geochemical rules could be used as background values oi mine locations, therefore could be used for evaluation of mine contamination. Bayesian statistics indicated that the probability of Au/Ag deposit increases as CaO, Cu, MgO, MnO, Pb and Li increases, and Zr decreases.
PDF KSCI

Extraction of Hierarchical Decision Rules from Clinical Databases using Rough Sets

Tsumoto, Shusaku
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2001.01a
- /
- pp.336-342
- /
- 2001
One of the most important problems on rule induction methods is that they cannot extract rules, which plausibly represent experts decision processes. On one hand, rule induction methods induce probabilistic rules, the description length of which is too short, compared with the experts rules. On the other hand, construction of Bayesian networks generates too lengthy rules. In this paper, the characteristics of experts rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the classes are classified into several groups with respect to the characterization. Then, two kinds of sub-rules, characterization rules for each group and discrimination rules for each class in the group are induced. Finally, those two parts are integrated into one rule for each decision attribute. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts decision processes.
PDF

Protein Secondary Structure Prediction using Multiple Neural Network Likelihood Models

Kim, Seong-Gon;Kim, Yong-Gi
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.10 no.4
- /
- pp.314-318
- /
- 2010
Predicting Alpha-helicies, Beta-sheets and Turns of a proteins secondary structure is a complex non-linear task that has been approached by several techniques such as Neural Networks, Genetic Algorithms, Decision Trees and other statistical or heuristic methods. This project introduces a new machine learning method by combining Bayesian Inference with offline trained Multilayered Perceptron (MLP) models as the likelihood for secondary structure prediction of proteins. With varying window sizes of neighboring amino acid information, the information is extracted and passed back and forth between the Neural Net and the Bayesian Inference process until the posterior probability of the secondary structure converges.
https://doi.org/10.5391/IJFIS.2010.10.4.314 인용 PDF KSCI

Search Result 206, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)