• Title/Summary/Keyword: Test mining

Search Result 519, Processing Time 0.026 seconds

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Selection Model of System Trading Strategies using SVM (SVM을 이용한 시스템트레이딩전략의 선택모형)

  • Park, Sungcheol;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.59-71
    • /
    • 2014
  • System trading is becoming more popular among Korean traders recently. System traders use automatic order systems based on the system generated buy and sell signals. These signals are generated from the predetermined entry and exit rules that were coded by system traders. Most researches on system trading have focused on designing profitable entry and exit rules using technical indicators. However, market conditions, strategy characteristics, and money management also have influences on the profitability of the system trading. Unexpected price deviations from the predetermined trading rules can incur large losses to system traders. Therefore, most professional traders use strategy portfolios rather than only one strategy. Building a good strategy portfolio is important because trading performance depends on strategy portfolios. Despite of the importance of designing strategy portfolio, rule of thumb methods have been used to select trading strategies. In this study, we propose a SVM-based strategy portfolio management system. SVM were introduced by Vapnik and is known to be effective for data mining area. It can build good portfolios within a very short period of time. Since SVM minimizes structural risks, it is best suitable for the futures trading market in which prices do not move exactly the same as the past. Our system trading strategies include moving-average cross system, MACD cross system, trend-following system, buy dips and sell rallies system, DMI system, Keltner channel system, Bollinger Bands system, and Fibonacci system. These strategies are well known and frequently being used by many professional traders. We program these strategies for generating automated system signals for entry and exit. We propose SVM-based strategies selection system and portfolio construction and order routing system. Strategies selection system is a portfolio training system. It generates training data and makes SVM model using optimal portfolio. We make $m{\times}n$ data matrix by dividing KOSPI 200 index futures data with a same period. Optimal strategy portfolio is derived from analyzing each strategy performance. SVM model is generated based on this data and optimal strategy portfolio. We use 80% of the data for training and the remaining 20% is used for testing the strategy. For training, we select two strategies which show the highest profit in the next day. Selection method 1 selects two strategies and method 2 selects maximum two strategies which show profit more than 0.1 point. We use one-against-all method which has fast processing time. We analyse the daily data of KOSPI 200 index futures contracts from January 1990 to November 2011. Price change rates for 50 days are used as SVM input data. The training period is from January 1990 to March 2007 and the test period is from March 2007 to November 2011. We suggest three benchmark strategies portfolio. BM1 holds two contracts of KOSPI 200 index futures for testing period. BM2 is constructed as two strategies which show the largest cumulative profit during 30 days before testing starts. BM3 has two strategies which show best profits during testing period. Trading cost include brokerage commission cost and slippage cost. The proposed strategy portfolio management system shows profit more than double of the benchmark portfolios. BM1 shows 103.44 point profit, BM2 shows 488.61 point profit, and BM3 shows 502.41 point profit after deducting trading cost. The best benchmark is the portfolio of the two best profit strategies during the test period. The proposed system 1 shows 706.22 point profit and proposed system 2 shows 768.95 point profit after deducting trading cost. The equity curves for the entire period show stable pattern. With higher profit, this suggests a good trading direction for system traders. We can make more stable and more profitable portfolios if we add money management module to the system.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

The Efficiency Analysis of CRM System in the Hotel Industry Using DEA (DEA를 이용한 호텔 관광 서비스 업계의 CRM 도입 효율성 분석)

  • Kim, Tai-Young;Seol, Kyung-Jin;Kwak, Young-Dai
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.91-110
    • /
    • 2011
  • This paper analyzes the cases where the hotels have increased their services and enhanced their work process through IT solutions to cope with computerization globalization. Also the cases have been studies where national hotels use the CRM solution internally to respond effectively to customers requests, increase customer analysis, and build marketing strategies. In particular, this study discusses the introduction of the CRM solutions and CRM sales business and marketing services using a process for utilizing the presumed, CRM by introducing effective DEA(Data Envelopment Analysis). First, the comparison has done regarding the relative efficiency of L Company with the CCR model, then compared L Company's restaurants and facilities' effectiveness through BCC model. L Company reached a conclusion that it is important to precisely create and manage sales data which are the preliminary data for CRM, and for that reason it made it possible to save sales data generated by POS system on each sales performance database. In order to do that, it newly established Oracle POS system and LORIS POS system concerned with restaurants for food and beverage as well as rooms, and made it possible to stably generate and manage sales data and manage. Moreover, it set up a composite database to control comprehensively the results of work processes during a specific period by collecting customer registration information and made it possible to systematically control the information on sales performances. By establishing a system which unifies database and managing it comprehensively, impeccability of data has been greatly enhanced and a problem which generated asymmetric data could be thoroughly solved. Using data accumulated on the comprehensive database, sales data can be analyzed, categorized, classified through data mining engine imbedded in Polaris CRM and the results can be organized on data mart to provide them in the form of CRM application data. By transforming original sales data into forms which are easy to handle and saving them on data mart separately, it enabled acquiring well-organized data with ease when engaging in various marketing operations, holding a morning meeting and working on decision-making. By using summarized data at data mart, it was possible to process marketing operations such as telemarketing, direct mailing, internet marketing service and service product developments for perceived customers; moreover, information on customer perceptions which is one of CRM's end-products could feed back into the comprehensive database. This research was undertaken to find out how effectively CRM has been employed by comparing and analyzing the management performance of each enterprise site and store after introducing CRM to Hotel enterprises using DEA technique. According to the research results, efficiency evaluation for each site was calculated through input and output factors to find out comparative CRM system usage efficiency of L's Company four sites; moreover, with regard to stores, the sizes of workforce and budget application show a huge difference and so does the each store efficiency. Furthermore, by using the DEA technique, it could assess which sites have comparatively high efficiency and which don't by comparing and evaluating hotel enterprises IT project outcomes such as CRM introduction using the CCR model for each site of the related enterprises. By using the BCC model, it could comparatively evaluate the outcome of CRM usage at each store of A site, which is representative of L Company, and as a result, it could figure out which stores maintain high efficiency in using CRM and which don't. It analyzed the cases of CRM introduction at L Company, which is a hotel enterprise, and precisely evaluated them through DEA. L Company analyzed the customer analysis system by introducing CRM and achieved to provide customers identified through client analysis data with one to one tailored services. Moreover, it could come up with a plan to differentiate the service for customers who revisit by assessing customer discernment rate. As tasks to be solved in the future, it is required to do research on the process analysis which can lead to a specific outcome such as increased sales volumes by carrying on test marketing, target marketing using CRM. Furthermore, it is also necessary to do research on efficiency evaluation in accordance with linkages between other IT solutions such as ERP and CRM system.

Neutralization of Pyrophyllite Mine Wastes by the Lime Cake By-Product (부산석회를 이용한 납석광산 폐석의 중화처리)

  • Yoo, Kyung-Yoal;Cheong, Young-Wook;Ok, Yong-Sik;Yang, Jae-E.
    • Korean Journal of Environmental Agriculture
    • /
    • v.24 no.3
    • /
    • pp.215-221
    • /
    • 2005
  • Numerous abandoned or closed mines are present in the steep mountain valleys in Korea due to the depression of the mining industry since the late 1980s. From the mines, enormous amounts of wastes were dumped on the slopes causing sedimentation and acid mine drainage to be discharged directly into streams causing detrimental effects on surrounding environment. Objective of this research was to evaluate the feasibility of the lime cake by-product from the soda ash production (Solvay process) to neutralize the pyrophyllite mine wastes, which have discharged the acid drainage to soil and stream in the watershed. The pH of mine wastes was strongly acidic at pH 3.67 containing over 16% of $Al_2O_3$ and 11% of $Fe_2O_3$. Whereas the lime cake by-product was strongly basic at pH 9.97 due to high contents of CaO, MgO and $CaCl_2$ as major components. Column experiments were conducted to test the neutralizing capacity of the lime cake by-product for the acidic pyrophyllite mine wastes. The column packed with the wastes (control) was treated with the lime cake by-product, calcium carbonate, the dressing soil or combination. The distilled water was eluted statically through the column and the leachate was collected for the chemical analyses. Treatments of the mine wastes with the lime cake by-product (or calcium carbonate) as mixtures increased pH of the leachate from $3.5{\sim}4.0\;to\;7{\sim}8$. Concentrations of Fe and Al in the leachate were also decreased below 1.0 mg $L^{-1}$. A Similar result was observed at the combined treatments of the mine waste, the lime by-product (or calcium carbonate) and the dressing soil. The results indicated that the lime cake by-product could sufficiently neutralize the acid drainage from the pyrophyllite mine wastes without dressing soils.

The Brand Personality Effect: Communicating Brand Personality on Twitter and its Influence on Online Community Engagement (브랜드 개성 효과: 트위터 상의 브랜드 개성 전달이 온라인 커뮤니티 참여에 미치는 영향)

  • Cruz, Ruth Angelie B.;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.67-101
    • /
    • 2014
  • The use of new technology greatly shapes the marketing strategies used by companies to engage their consumers. Among these new technologies, social media is used to reach out to the organization's audience online. One of the most popular social media channels to date is the microblogging platform Twitter. With 500 million tweets sent on average daily, the microblogging platform is definitely a rich source of data for researchers, and a lucrative marketing medium for companies. Nonetheless, one of the challenges for companies in developing an effective Twitter campaign is the limited theoretical and empirical evidence on the proper organizational usage of Twitter despite its potential advantages for a firm's external communications. The current study aims to provide empirical evidence on how firms can utilize Twitter effectively in their marketing communications using the association between brand personality and brand engagement that several branding researchers propose. The study extends Aaker's previous empirical work on brand personality by applying the Brand Personality Scale to explore whether Twitter brand communities convey distinctive brand personalities online and its influence on the communities' level or intensity of consumer engagement and sentiment quality. Moreover, the moderating effect of the product involvement construct in consumer engagement is also measured. By collecting data for a period of eight weeks using the publicly available Twitter application programming interface (API) from 23 accounts of Twitter-verified business-to-consumer (B2C) brands, we analyze the validity of the paper's hypothesis by using computerized content analysis and opinion mining. The study is the first to compare Twitter marketing across organizations using the brand personality concept. It demonstrates a potential basis for Twitter strategies and discusses the benefits of these strategies, thus providing a framework of analysis for Twitter practice and strategic direction for companies developing their use of Twitter to communicate with their followers on this social media platform. This study has four specific research objectives. The first objective is to examine the applicability of brand personality dimensions used in marketing research to online brand communities on Twitter. The second is to establish a connection between the congruence of offline and online brand personalities in building a successful social media brand community. Third, we test the moderating effect of product involvement in the effect of brand personality on brand community engagement. Lastly, we investigate the sentiment quality of consumer messages to the firms that succeed in communicating their brands' personalities on Twitter.

Bankruptcy Type Prediction Using A Hybrid Artificial Neural Networks Model (하이브리드 인공신경망 모형을 이용한 부도 유형 예측)

  • Jo, Nam-ok;Kim, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.79-99
    • /
    • 2015
  • The prediction of bankruptcy has been extensively studied in the accounting and finance field. It can have an important impact on lending decisions and the profitability of financial institutions in terms of risk management. Many researchers have focused on constructing a more robust bankruptcy prediction model. Early studies primarily used statistical techniques such as multiple discriminant analysis (MDA) and logit analysis for bankruptcy prediction. However, many studies have demonstrated that artificial intelligence (AI) approaches, such as artificial neural networks (ANN), decision trees, case-based reasoning (CBR), and support vector machine (SVM), have been outperforming statistical techniques since 1990s for business classification problems because statistical methods have some rigid assumptions in their application. In previous studies on corporate bankruptcy, many researchers have focused on developing a bankruptcy prediction model using financial ratios. However, there are few studies that suggest the specific types of bankruptcy. Previous bankruptcy prediction models have generally been interested in predicting whether or not firms will become bankrupt. Most of the studies on bankruptcy types have focused on reviewing the previous literature or performing a case study. Thus, this study develops a model using data mining techniques for predicting the specific types of bankruptcy as well as the occurrence of bankruptcy in Korean small- and medium-sized construction firms in terms of profitability, stability, and activity index. Thus, firms will be able to prevent it from occurring in advance. We propose a hybrid approach using two artificial neural networks (ANNs) for the prediction of bankruptcy types. The first is a back-propagation neural network (BPN) model using supervised learning for bankruptcy prediction and the second is a self-organizing map (SOM) model using unsupervised learning to classify bankruptcy data into several types. Based on the constructed model, we predict the bankruptcy of companies by applying the BPN model to a validation set that was not utilized in the development of the model. This allows for identifying the specific types of bankruptcy by using bankruptcy data predicted by the BPN model. We calculated the average of selected input variables through statistical test for each cluster to interpret characteristics of the derived clusters in the SOM model. Each cluster represents bankruptcy type classified through data of bankruptcy firms, and input variables indicate financial ratios in interpreting the meaning of each cluster. The experimental result shows that each of five bankruptcy types has different characteristics according to financial ratios. Type 1 (severe bankruptcy) has inferior financial statements except for EBITDA (earnings before interest, taxes, depreciation, and amortization) to sales based on the clustering results. Type 2 (lack of stability) has a low quick ratio, low stockholder's equity to total assets, and high total borrowings to total assets. Type 3 (lack of activity) has a slightly low total asset turnover and fixed asset turnover. Type 4 (lack of profitability) has low retained earnings to total assets and EBITDA to sales which represent the indices of profitability. Type 5 (recoverable bankruptcy) includes firms that have a relatively good financial condition as compared to other bankruptcy types even though they are bankrupt. Based on the findings, researchers and practitioners engaged in the credit evaluation field can obtain more useful information about the types of corporate bankruptcy. In this paper, we utilized the financial ratios of firms to classify bankruptcy types. It is important to select the input variables that correctly predict bankruptcy and meaningfully classify the type of bankruptcy. In a further study, we will include non-financial factors such as size, industry, and age of the firms. Thus, we can obtain realistic clustering results for bankruptcy types by combining qualitative factors and reflecting the domain knowledge of experts.

Spectral Induced Polarization Characteristics of Rocks in Gwanin Vanadiferous Titanomagnetite (VTM) Deposit (관인 함바나듐 티탄철광상 암석의 광대역 유도분극 특성)

  • Shin, Seungwook
    • Geophysics and Geophysical Exploration
    • /
    • v.24 no.4
    • /
    • pp.194-201
    • /
    • 2021
  • Induced polarization (IP) effect is known to be caused by electrochemical phenomena at interface between minerals and pore water. Spectral induced polarization (SIP) method is an electrical survey to localize subsurface IP anomalies while injecting alternating currents of multiple frequencies into the ground. This method was effectively applied to mineral exploration of various ore deposits. Titanomagnetite ores were being produced by a mining company located in Gonamsan area, Gwanin-myeon, Pocheon-si, Gyeonggi-do, South Korea. Because the ores contain more than 0.4 w% vanadium, the ore deposit is called as Gwanin vanadiferous titanomagnetite (VTM) deposit. The vanadium is the most important of materials in production of vanadium redox flow batteries, which can be appropriately used for large-scale energy storage system. Systematic mineral exploration was conducted to identify presence of hidden VTM orebodies and estimate their potential resources. In geophysical exploration, laboratory geophysical measurement of rock samples is helpful to generate reliable property models from field survey data. Therefore, we performed laboratory SIP data of the rocks from the Gwanin VTM deposit to understand SIP characteristics between ores and host rocks and then demonstrate the applicability of this method for the mineral exploration. Both phase and resistivity spectra of the ores sampled from underground outcrop and drilling cores were different of those of the host rocks consisting of monzodiorite and quartz monzodiorite. Because the phase and resistivity at frequencies below 100 Hz are mainly dependent on the SIP characteristics of the rocks, we calculated mean values of the ores and the host rocks. The average phase values at 0.1 Hz were ores: -369 mrad and host rocks: -39 mrad. The average resistivity values at 0.1 Hz were ores: 16 Ωm and host rocks: 2,623 Ωm. Because the SIP characteristics of the ores were different of those of the host rocks, we considered that the SIP survey is effective for the mineral exploration in vanadiferous titanomagnetite deposits and the SIP characteristics are useful for interpreting field survey data.