• Title/Summary/Keyword: smart mining

Search Result 261, Processing Time 0.028 seconds

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

A Novel Methodology for Extracting Core Technology and Patents by IP Mining (핵심 기술 및 특허 추출을 위한 IP 마이닝에 관한 연구)

  • Kim, Hyun Woo;Kim, Jongchan;Lee, Joonhyuck;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.4
    • /
    • pp.392-397
    • /
    • 2015
  • Society has been developed through analogue, digital, and smart era. Every technology is going through consistent changes and rapid developments. In this competitive society, R&D strategy establishment is significantly useful and helpful for improving technology competitiveness. A patent document includes technical and legal rights information such as title, abstract, description, claim, and patent classification code. From the patent document, a lot of people can understand and collect legal and technical information. This unique feature of patent can be quantitatively applied for technology analysis. This research paper proposes a methodology for extracting core technology and patents based on quantitative methods. Statistical analysis and social network analysis are applied to IPC codes in order to extract core technologies with active R&D and high centralities. Then, core patents are also extracted by analyzing citation and family information.

Development of Machine Learning-Based Platform for Distillation Column (증류탑을 위한 머신러닝 기반 플랫폼 개발)

  • Oh, Kwang Cheol;Kwon, Hyukwon;Roh, Jiwon;Choi, Yeongryeol;Park, Hyundo;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.58 no.4
    • /
    • pp.565-572
    • /
    • 2020
  • This study developed a software platform using machine learning of artificial intelligence to optimize the distillation column system. The distillation column is representative and core process in the petrochemical industry. Process stabilization is difficult due to various operating conditions and continuous process characteristics, and differences in process efficiency occur depending on operator skill. The process control based on the theoretical simulation was used to overcome this problem, but it has a limitation which it can't apply to complex processes and real-time systems. This study aims to develop an empirical simulation model based on machine learning and to suggest an optimal process operation method. The development of empirical simulations involves collecting big data from the actual process, feature extraction through data mining, and representative algorithm for the chemical process. Finally, the platform for the distillation column was developed with verification through a developed model and field tests. Through the developed platform, it is possible to predict the operating parameters and provided optimal operating conditions to achieve efficient process control. This study is the basic study applying the artificial intelligence machine learning technique for the chemical process. After application on a wide variety of processes and it can be utilized to the cornerstone of the smart factory of the industry 4.0.

Designing mobile personal assistant agent based on users' experience and their position information (위치정보 및 사용자 경험을 반영하는 모바일 PA에이전트의 설계)

  • Kang, Shin-Bong;Noh, Sang-Uk
    • Journal of Internet Computing and Services
    • /
    • v.12 no.1
    • /
    • pp.99-110
    • /
    • 2011
  • Mobile environments rapidly changing and digital convergence widely employed, mobile devices including smart phones have been playing a critical role that changes users' lifestyle in the areas of entertainments, businesses and information services. The various services using mobile devices are developing to meet the personal needs of users in the mobile environments. Especially, an LBS (Location-Based Service) is combined with other services and contents such as augmented reality, mobile SNS (Social Network Service), games, and searching, which can provide convenient and useful services to mobile users. In this paper, we design and implement the prototype of mobile personal assistant (PA) agents. Our personal assistant agent helps users do some tasks by hiding the complexity of difficult tasks, performing tasks on behalf of the users, and reflecting the preferences of users. To identify user's preferences and provide personalized services, clustering and classification algorithms of data mining are applied. The clusters of the log data using clustering algorithms are made by measuring the dissimilarity between two objects based on usage patterns. The classification algorithms produce user profiles within each cluster, which make it possible for PA agents to provide users with personalized services and contents. In the experiment, we measured the classification accuracy of user model clustered using clustering algorithms. It turned out that the classification accuracy using our method was increased by 17.42%, compared with that using other clustering algorithms.

A Study on Social Media Sentiment Analysis for Exploring Public Opinions Related to Education Policies (교육정책관련 여론탐색을 위한 소셜미디어 감정분석 연구)

  • Chung, Jin-Myeong;Yoo, Ki-Young;Koo, Chan-Dong
    • Informatization Policy
    • /
    • v.24 no.4
    • /
    • pp.3-16
    • /
    • 2017
  • With the development of social media services in the era of Web 2.0, the public opinion formation site has been partially shifted from the traditional mass media to social media. This phenomenon is continuing to expand, and public opinions on government polices created and shared on social media are attracting more attention. It is particularly important to grasp public opinions in policy formulation because setting up educational policies involves a variety of stakeholders and conflicts. The purpose of this study is to explore public opinions about education-related policies through an empirical analysis of social media documents on education policies using opinion mining techniques. For this purpose, we collected the education policy-related documents by keyword, which were produced by users through the social media service, tokenized and extracted sentimental qualities of the documents, and scored the qualities using sentiment dictionaries to find out public preferences for specific education policies. As a result, a lot of negative public opinions were found regarding the smart education policies that use the keywords of digital textbooks and e-learning; while the software education policies using coding education and computer thinking as the keywords had more positive opinions. In addition, the general policies having the keywords of free school terms and creative personality education showed more negative public opinions. As much as 20% of the documents were unable to extract sentiments from, signifying that there are still a certain share of blog posts or tweets that do not reflect the writers' opinions.

Outside Temperature Prediction Based on Artificial Neural Network for Estimating the Heating Load in Greenhouse (인공신경망 기반 온실 외부 온도 예측을 통한 난방부하 추정)

  • Kim, Sang Yeob;Park, Kyoung Sub;Ryu, Keun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.129-134
    • /
    • 2018
  • Recently, the artificial neural network (ANN) model is a promising technique in the prediction, numerical control, robot control and pattern recognition. We predicted the outside temperature of greenhouse using ANN and utilized the model in greenhouse control. The performance of ANN model was evaluated and compared with multiple regression model(MRM) and support vector machine (SVM) model. The 10-fold cross validation was used as the evaluation method. In order to improve the prediction performance, the data reduction was performed by correlation analysis and new factor were extracted from measured data to improve the reliability of training data. The backpropagation algorithm was used for constructing ANN, multiple regression model was constructed by M5 method. And SVM model was constructed by epsilon-SVM method. As the result showed that the RMSE (Root Mean Squared Error) value of ANN, MRM and SVM were 0.9256, 1.8503 and 7.5521 respectively. In addition, by applying the prediction model to greenhouse heating load calculation, it can increase the income by reducing the energy cost in the greenhouse. The heating load of the experimented greenhouse was 3326.4kcal/h and the fuel consumption was estimated to be 453.8L as the total heating time is $10000^{\circ}C/h$. Therefore, data mining technology of ANN can be applied to various agricultural fields such as precise greenhouse control, cultivation techniques, and harvest prediction, thereby contributing to the development of smart agriculture.

Construction of Precise Mine Geospatial Information and Ore Modeling for Smart Mining (스마트마이닝을 위한 정밀 광산공간정보 구축 및 광체 모델링)

  • Park, Joon Kyu;Jung, Kap Yong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.6
    • /
    • pp.725-731
    • /
    • 2020
  • In mineral resource development, resource exploration is a task to find economical minerals on the surface and underground, and the success rate is low compared to the development and production stages, and it is necessary to collect a lot of data through exploration and accurately analyze the collected information. In this study, mine spatial information was constructed using a 3D (Three-dimensional) laser scanner, and accuracy evaluation was performed to obtain a maximum deviation of 0.140 m and an average of 0.095 m in the X, Y and Z directions, and the possibility of utilizing the construction of mine geospatial information through a 3D laser scanner could be presented. In addition, the ore body modeling was performed by applying the interpolation method of the ore body section using the resource exploration results. The ore body modeling result was superimposed with the modeling result of the mine geospatial information built through the 3D laser scanner to construct the ore body modeling result based on the precise mine geospatial information. The results of ore body modeling based on mine geospatial information built through research can increase the ease of data interpretation and the accuracy of the calculated data, which will greatly increase the efficiency of work related to mineral resource development and mine damage prevention in the future.

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.