• Title/Summary/Keyword: Classification Performance

Search Result 3,735, Processing Time 0.035 seconds

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Estimation and Mapping of Soil Organic Matter using Visible-Near Infrared Spectroscopy (분광학을 이용한 토양 유기물 추정 및 분포도 작성)

  • Choe, Eun-Young;Hong, Suk-Young;Kim, Yi-Hyun;Zhang, Yong-Seon
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.6
    • /
    • pp.968-974
    • /
    • 2010
  • We assessed the feasibility of discrete wavelet transform (DWT) applied for the spectral processing to enhance the estimation performance quality of soil organic matters using visible-near infrared spectra and mapped their distribution via block Kriging model. Continuum-removal and $1^{st}$ derivative transform as well as Haar and Daubechies DWT were used to enhance spectral variation in terms of soil organic matter contents and those spectra were put into the PLSR (Partial Least Squares Regression) model. Estimation results using raw reflectance and transformed spectra showed similar quality with $R^2$ > 0.6 and RPD> 1.5. These values mean the approximation prediction on soil organic matter contents. The poor performance of estimation using DWT spectra might be caused by coarser approximation of DWT which not enough to express spectral variation based on soil organic matter contents. The distribution maps of soil organic matter were drawn via a spatial information model, Kriging. Organic contents of soil samples made Gaussian distribution centered at around 20 g $kg^{-1}$ and the values in the map were distributed with similar patterns. The estimated organic matter contents had similar distribution to the measured values even though some parts of estimated value map showed slightly higher. If the estimation quality is improved more, estimation model and mapping using spectroscopy may be applied in global soil mapping, soil classification, and remote sensing data analysis as a rapid and cost-effective method.

Development of Deep Learning Structure to Improve Quality of Polygonal Containers (다각형 용기의 품질 향상을 위한 딥러닝 구조 개발)

  • Yoon, Suk-Moon;Lee, Seung-Ho
    • Journal of IKEEE
    • /
    • v.25 no.3
    • /
    • pp.493-500
    • /
    • 2021
  • In this paper, we propose the development of deep learning structure to improve quality of polygonal containers. The deep learning structure consists of a convolution layer, a bottleneck layer, a fully connect layer, and a softmax layer. The convolution layer is a layer that obtains a feature image by performing a convolution 3x3 operation on the input image or the feature image of the previous layer with several feature filters. The bottleneck layer selects only the optimal features among the features on the feature image extracted through the convolution layer, reduces the channel to a convolution 1x1 ReLU, and performs a convolution 3x3 ReLU. The global average pooling operation performed after going through the bottleneck layer reduces the size of the feature image by selecting only the optimal features among the features of the feature image extracted through the convolution layer. The fully connect layer outputs the output data through 6 fully connect layers. The softmax layer multiplies and multiplies the value between the value of the input layer node and the target node to be calculated, and converts it into a value between 0 and 1 through an activation function. After the learning is completed, the recognition process classifies non-circular glass bottles by performing image acquisition using a camera, measuring position detection, and non-circular glass bottle classification using deep learning as in the learning process. In order to evaluate the performance of the deep learning structure to improve quality of polygonal containers, as a result of an experiment at an authorized testing institute, it was calculated to be at the same level as the world's highest level with 99% good/defective discrimination accuracy. Inspection time averaged 1.7 seconds, which was calculated within the operating time standards of production processes using non-circular machine vision systems. Therefore, the effectiveness of the performance of the deep learning structure to improve quality of polygonal containers proposed in this paper was proven.

Manbojeonseo(萬寶全書) Geumdoron(琴道論) in the old scores of Joseon(朝鮮) (조선시대 고악보에 나타난 『만보전서(萬寶全書)』의 금도론(琴道論))

  • Choi, Sun-a
    • (The) Research of the performance art and culture
    • /
    • no.20
    • /
    • pp.251-307
    • /
    • 2010
  • Manbojeonseo, a kind of an encyclopedia published several times in Ming Ch'ing dynasty, includes useful information for scholars and common people on daily lives. In 1720, Manbojeonseo was first introduced to Joseon(朝鮮) dynasty by the diplomatic corps visiting Ch'ing dynasty, and widely circulated in the society as an useful information magazine or an individual collection of reference book. Since Manbojeonseo includes the systematically-organized contents of Geumdoron(琴道論, a theory of a heptachord), it could provide a useful reference when the Geumdoron was inserted as the contents of old scores. For an instance, Obultan(五不彈), Tangeumsuji(彈琴須知), and Taeeumgibeop(太音紀法) recorded in Hangeumsinbo(韓琴新譜, 1724) clearly acknowledge Manbojeonseo as their common source. In this paper, the order and the contents of Geumdorons from four different Manbojeonseo are compared. At first, the comparative analysis of Manbojeonseo (1610) edited by Seo Giryong(徐企龍) and Manbojeonseo(1612) edited by Yu Jamyeong(劉子明) are carried out focusing on the contents of the Geumdoron, where both Manbojeonseos contain considerable amount of Geumdoron sections. The tables of the contents in both Manbojeonseos are composed of upper and lower levels classified into 4 large divisions for each. While the contents of the upper level is presumably older and focused more on the theory of the cardinal virtues, the contents of the lower one is relatively new and centered more on the skills for the real play of a heptachord(琴), the lyrics and the musical scores composed of Gamjabo(減字譜). Therefore, it could be said that the upper level is metaphysical while the lower level is physical. One of the differences between those two Manbojeonseos lies in the order and the terminology found in the large divisions. In the case of Manbojeonseo(1612), some terms in the large division represent and theoretically group the detailed descriptions in the small divisions such as 5 demands or 7 taboos in the play of the heptachord. In addition, a few lower divisions were newly added or revised in order to enhance the completeness of Geumhangmun(琴學門, study of a heptachord), and the detailed classification was revised and polished to improve the reasonableness. In Manbojeonseo(1614) composed by the same editor as Manbojeonseo(1610), the contents of the Geumdoron become much briefer than those of Manbojeonseo(1610) and Manbojeonseo(1612). In the case of Manbojeonseo(1739), a new type of the Geumdoron is included called Oeumjeongjobo(五音正操譜) while carrying a similarly brief section of the Geumdoron. Finally, the Geumdorons in Manbojeonseo and several old scores are comparatively analyzed. While the Geumbo(琴譜) owned by Gugagwon(國樂院) and Hangeumsinbo contains relatively old Geumdoron, Yuyeji(遊藝志) and Bangsanhanssigeumbo(芳山韓氏琴譜) adopt practical and relatively new Geumdorons different from the former old scores and similar to Manbojeonseo(1739) considering the order and the contents. In particular, the contents of the Geumdoron in Geumheonakbo(琴軒樂譜) is notably unique containing much of the upper and the lower levels of Manbojeonseo(1612), therefore thought to have actively adopted the contents of new Geumdorons.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

  • Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.307-325
    • /
    • 2023
  • Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.

A Data-based Sales Forecasting Support System for New Businesses (데이터기반의 신규 사업 매출추정방법 연구: 지능형 사업평가 시스템을 중심으로)

  • Jun, Seung-Pyo;Sung, Tae-Eung;Choi, San
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.1-22
    • /
    • 2017
  • Analysis of future business or investment opportunities, such as business feasibility analysis and company or technology valuation, necessitate objective estimation on the relevant market and expected sales. While there are various ways to classify the estimation methods of these new sales or market size, they can be broadly divided into top-down and bottom-up approaches by benchmark references. Both methods, however, require a lot of resources and time. Therefore, we propose a data-based intelligent demand forecasting system to support evaluation of new business. This study focuses on analogical forecasting, one of the traditional quantitative forecasting methods, to develop sales forecasting intelligence systems for new businesses. Instead of simply estimating sales for a few years, we hereby propose a method of estimating the sales of new businesses by using the initial sales and the sales growth rate of similar companies. To demonstrate the appropriateness of this method, it is examined whether the sales performance of recently established companies in the same industry category in Korea can be utilized as a reference variable for the analogical forecasting. In this study, we examined whether the phenomenon of "mean reversion" was observed in the sales of start-up companies in order to identify errors in estimating sales of new businesses based on industry sales growth rate and whether the differences in business environment resulting from the different timing of business launch affects growth rate. We also conducted analyses of variance (ANOVA) and latent growth model (LGM) to identify differences in sales growth rates by industry category. Based on the results, we proposed industry-specific range and linear forecasting models. This study analyzed the sales of only 150,000 start-up companies in Korea in the last 10 years, and identified that the average growth rate of start-ups in Korea is higher than the industry average in the first few years, but it shortly shows the phenomenon of mean-reversion. In addition, although the start-up founding juncture affects the sales growth rate, it is not high significantly and the sales growth rate can be different according to the industry classification. Utilizing both this phenomenon and the performance of start-up companies in relevant industries, we have proposed two models of new business sales based on the sales growth rate. The method proposed in this study makes it possible to objectively and quickly estimate the sales of new business by industry, and it is expected to provide reference information to judge whether sales estimated by other methods (top-down/bottom-up approach) pass the bounds from ordinary cases in relevant industry. In particular, the results of this study can be practically used as useful reference information for business feasibility analysis or technical valuation for entering new business. When using the existing top-down method, it can be used to set the range of market size or market share. As well, when using the bottom-up method, the estimation period may be set in accordance of the mean reverting period information for the growth rate. The two models proposed in this study will enable rapid and objective sales estimation of new businesses, and are expected to improve the efficiency of business feasibility analysis and technology valuation process by developing intelligent information system. In academic perspectives, it is a very important discovery that the phenomenon of 'mean reversion' is found among start-up companies out of general small-and-medium enterprises (SMEs) as well as stable companies such as listed companies. In particular, there exists the significance of this study in that over the large-scale data the mean reverting phenomenon of the start-up firms' sales growth rate is different from that of the listed companies, and that there is a difference in each industry. If a linear model, which is useful for estimating the sales of a specific company, is highly likely to be utilized in practical aspects, it can be explained that the range model, which can be used for the estimation method of the sales of the unspecified firms, is highly likely to be used in political aspects. It implies that when analyzing the business activities and performance of a specific industry group or enterprise group there is political usability in that the range model enables to provide references and compare them by data based start-up sales forecasting system.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."