• Title/Summary/Keyword: 의미 기반 정보 추출

Search Result 678, Processing Time 0.037 seconds

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

Location Generalization Method of Moving Object using $R^*$-Tree and Grid ($R^*$-Tree와 Grid를 이용한 이동 객체의 위치 일반화 기법)

  • Ko, Hyun;Kim, Kwang-Jong;Lee, Yon-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.2 s.46
    • /
    • pp.231-242
    • /
    • 2007
  • The existing pattern mining methods[1,2,3,4,5,6,11,12,13] do not use location generalization method on the set of location history data of moving object, but even so they simply do extract only frequent patterns which have no spatio-temporal constraint in moving patterns on specific space. Therefore, it is difficult for those methods to apply to frequent pattern mining which has spatio-temporal constraint such as optimal moving or scheduling paths among the specific points. And also, those methods are required more large memory space due to using pattern tree on memory for reducing repeated scan database. Therefore, more effective pattern mining technique is required for solving these problems. In this paper, in order to develop more effective pattern mining technique, we propose new location generalization method that converts data of detailed level into meaningful spatial information for reducing the processing time for pattern mining of a massive history data set of moving object and space saving. The proposed method can lead the efficient spatial moving pattern mining of moving object using by creating moving sequences through generalizing the location attributes of moving object into 2D spatial area based on $R^*$-Tree and Area Grid Hash Table(AGHT) in preprocessing stage of pattern mining.

  • PDF

A Method for Protein Functional Flow Configuration and Validation (단백질 기능 흐름 모델 구성 및 평가 기법)

  • Jang, Woo-Hyuk;Jung, Suk-Hoon;Han, Dong-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.4
    • /
    • pp.284-288
    • /
    • 2009
  • With explosively growing PPI databases, the computational approach for a prediction and configuration of PPI network has been a big stream in the bioinformatics area. Recent researches gradually consider physicochemical properties of proteins and support high resolution results with integration of experimental results. With regard to current research trend, it is very close future to complete a PPI network configuration of each organism. However, direct applying the PPI network to real field is complicated problem because PPI network is only a set of co-expressive proteins or gene products, and its network link means simple physical binding rather than in-depth knowledge of biological process. In this paper, we suggest a protein functional flow model which is a directed network based on a protein functions' relation of signaling transduction pathway. The vertex of the suggested model is a molecular function annotated by gene ontology, and the relations among the vertex are considered as edges. Thus, it is easy to trace a specific function's transition, and it can be a constraint to extract a meaningful sub-path from whole PPI network. To evaluate the model, 11 functional flow models of Homo sapiens were built from KEGG, and Cronbach's alpha values were measured (alpha=0.67). Among 1023 functional flows, 765 functional flows showed 0.6 or higher alpha values.

Emoticon by Emotions: The Development of an Emoticon Recommendation System Based on Consumer Emotions (Emoticon by Emotions: 소비자 감성 기반 이모티콘 추천 시스템 개발)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.227-252
    • /
    • 2018
  • The evolution of instant communication has mirrored the development of the Internet and messenger applications are among the most representative manifestations of instant communication technologies. In messenger applications, senders use emoticons to supplement the emotions conveyed in the text of their messages. The fact that communication via messenger applications is not face-to-face makes it difficult for senders to communicate their emotions to message recipients. Emoticons have long been used as symbols that indicate the moods of speakers. However, at present, emoticon-use is evolving into a means of conveying the psychological states of consumers who want to express individual characteristics and personality quirks while communicating their emotions to others. The fact that companies like KakaoTalk, Line, Apple, etc. have begun conducting emoticon business and sales of related content are expected to gradually increase testifies to the significance of this phenomenon. Nevertheless, despite the development of emoticons themselves and the growth of the emoticon market, no suitable emoticon recommendation system has yet been developed. Even KakaoTalk, a messenger application that commands more than 90% of domestic market share in South Korea, just grouped in to popularity, most recent, or brief category. This means consumers face the inconvenience of constantly scrolling around to locate the emoticons they want. The creation of an emoticon recommendation system would improve consumer convenience and satisfaction and increase the sales revenue of companies the sell emoticons. To recommend appropriate emoticons, it is necessary to quantify the emotions that the consumer sees and emotions. Such quantification will enable us to analyze the characteristics and emotions felt by consumers who used similar emoticons, which, in turn, will facilitate our emoticon recommendations for consumers. One way to quantify emoticons use is metadata-ization. Metadata-ization is a means of structuring or organizing unstructured and semi-structured data to extract meaning. By structuring unstructured emoticon data through metadata-ization, we can easily classify emoticons based on the emotions consumers want to express. To determine emoticons' precise emotions, we had to consider sub-detail expressions-not only the seven common emotional adjectives but also the metaphorical expressions that appear only in South Korean proved by previous studies related to emotion focusing on the emoticon's characteristics. We therefore collected the sub-detail expressions of emotion based on the "Shape", "Color" and "Adumbration". Moreover, to design a highly accurate recommendation system, we considered both emotion-technical indexes and emoticon-emotional indexes. We then identified 14 features of emoticon-technical indexes and selected 36 emotional adjectives. The 36 emotional adjectives consisted of contrasting adjectives, which we reduced to 18, and we measured the 18 emotional adjectives using 40 emoticon sets randomly selected from the top-ranked emoticons in the KakaoTalk shop. We surveyed 277 consumers in their mid-twenties who had experience purchasing emoticons; we recruited them online and asked them to evaluate five different emoticon sets. After data acquisition, we conducted a factor analysis of emoticon-emotional factors. We extracted four factors that we named "Comic", Softness", "Modernity" and "Transparency". We analyzed both the relationship between indexes and consumer attitude and the relationship between emoticon-technical indexes and emoticon-emotional factors. Through this process, we confirmed that the emoticon-technical indexes did not directly affect consumer attitudes but had a mediating effect on consumer attitudes through emoticon-emotional factors. The results of the analysis revealed the mechanism consumers use to evaluate emoticons; the results also showed that consumers' emoticon-technical indexes affected emoticon-emotional factors and that the emoticon-emotional factors affected consumer satisfaction. We therefore designed the emoticon recommendation system using only four emoticon-emotional factors; we created a recommendation method to calculate the Euclidean distance from each factors' emotion. In an attempt to increase the accuracy of the emoticon recommendation system, we compared the emotional patterns of selected emoticons with the recommended emoticons. The emotional patterns corresponded in principle. We verified the emoticon recommendation system by testing prediction accuracy; the predictions were 81.02% accurate in the first result, 76.64% accurate in the second, and 81.63% accurate in the third. This study developed a methodology that can be used in various fields academically and practically. We expect that the novel emoticon recommendation system we designed will increase emoticon sales for companies who conduct business in this domain and make consumer experiences more convenient. In addition, this study served as an important first step in the development of an intelligent emoticon recommendation system. The emotional factors proposed in this study could be collected in an emotional library that could serve as an emotion index for evaluation when new emoticons are released. Moreover, by combining the accumulated emotional library with company sales data, sales information, and consumer data, companies could develop hybrid recommendation systems that would bolster convenience for consumers and serve as intellectual assets that companies could strategically deploy.

Personalized Media Control Method using Probabilistic Fuzzy Rule-based Learning (확률적 퍼지 룰 기반 학습에 의한 개인화된 미디어 제어 방법)

  • Lee, Hyong-Euk;Kim, Yong-Hwi;Lee, Tae-Youb;Park, Kwang-Hyun;Kim, Yong-Soo;Cho, Joon-Myun;Bien, Z. Zenn
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.2
    • /
    • pp.244-251
    • /
    • 2007
  • Intention reading technique is essential to provide personalized services toward more convenient and human-friendly services in complex ubiquitous environment such as a smart home. If a system has knowledge about an user's intention of his/her behavioral pattern, the system can provide mote qualified and satisfactory services automatically in advance to the user's explicit command. In this sense, learning capability is considered as a key function for the intention reading technique in view of knowledge discovery. In this paper, ore introduce a personalized media control method for a possible application iii a smart home. Note that data pattern such as human behavior contains lots of inconsistent data due to limitation of feature extraction and insufficiently available features, where separable data groups are intermingled with inseparable data groups. To deal with such a data pattern, we introduce an effective engineering approach with the combination of fuzzy logic and probabilistic reasoning. The proposed learning system, which is based on IFCS (Iterative Fuzzy Clustering with Supervision) algorithm, extract probabilistic fuzzy rules effectively from the given numerical training data pattern. Furthermore, an extended architectural design methodology of the learning system incorporating with the IFCS algorithm are introduced. Finally, experimental results of the media contents recommendation system are given to show the effectiveness of the proposed system.

The Model of Network Packet Analysis based on Big Data (빅 데이터 기반의 네트워크 패킷 분석 모델)

  • Choi, Bomin;Kong, Jong-Hwan;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.392-399
    • /
    • 2013
  • Due to the development of IT technology and the information age, a dependency of the network over the most of our lives have grown to a greater extent. Although it provides us to get various useful information and service, it also has negative effectiveness that can provide network intruder with vulnerable roots. In other words, we need to urgently cope with theses serious security problem causing service disableness or system connected to network obstacle with exploiting various packet information. Many experts in a field of security are making an effort to develop the various security solutions to respond against these threats, but existing solutions have a lot of problems such as lack of storage capacity and performance degradation along with the massive increase of packet data volume. Therefore we propose the packet analysis model to apply issuing Big Data technology in the field of security. That is, we used NoSQL which is technology of massive data storage to collect the packet data growing massive and implemented the packet analysis model based on K-means clustering using MapReudce which is distributed programming framework, and then we have shown its high performance by experimenting.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Rehabilitation Priority Decision Model for Sewer Systems (하수관거시스템 개량 우선순위 결정 모형)

  • Lee, Jung-Ho;Park, Moo-Jong;Kim, Joong-Hoon
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.8 no.6
    • /
    • pp.7-14
    • /
    • 2008
  • The main objective of sewer rehabilitation is to improve its function while eliminating inflow/infiltration (I/I). If we can identify the amount of I/I for an individual pipe, it is possible to estimate the I/Is of sub-areas clearly. However, in real, the amount of I/I for an individual pipe is almost impossible to be obtained due to the limitation of cost and time. In this study, I/I occurrence of each sewer pipe is estimated using AHP (Analytic Hierarch Process) and RPDM (Rehabilitation Priority Decision Model for sewer system) was developed using the estimated I/I of each pipe to perform the efficient sewer rehabilitation. Based on the determined amount of I/I for an individual pipe, the RPDM determines the optimal rehabilitation priority (ORP) using a genetic algorithm for sub-areas in term of minimizing the amount of I/I occurring while the rehabilitation process is performed. The benefit obtained by implementing the ORP for rehabilitation of sub-areas is estimated by the only waste water treatment cost (WWTC) of I/I which occurs during the sewer rehabilitation period. The results of the ORP were compared with those of a numerical weighting method (NWM) which is the decision method for the rehabilitation priority in the general sewer rehabilitation practices and the worst order which are other methods to determine the rehabilitation order of sub-areas in field. The ORP reduced the WWTC by 22% compared to the NWM and by 40% compared to the worst order.

Symmetric-Invariant Boundary Image Matching Based on Time-Series Data (시계열 데이터 기반의 대칭-불변 윤곽선 이미지 매칭)

  • Lee, Sanghun;Bang, Junsang;Moon, Seongwoo;Moon, Yang-Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.10
    • /
    • pp.431-438
    • /
    • 2015
  • In this paper we address the symmetric-invariant problem in boundary image matching. Supporting symmetric transformation is an important factor in boundary image matching to get more intuitive and more accurate matching results. However, the previous boundary image matching handled rotation transformation only without considering symmetric transformation. In this paper, we propose symmetric-invariant boundary image matching which supports the symmetric transformation as well as the rotation transformation. For this, we define the concept of image symmetry and formally prove that rotation-invariant matching of using a symmetric image always returns the same result for every symmetric angle. For efficient symmetric transformation, we also present how to efficiently extract the symmetric time-series from an image boundary. Finally, we formally prove that our symmetric-invariant matching produces the same result for two approaches: one is using the time-series extracted from the symmetric image; another is using the time-series directly obtained from the original image time-series by symmetric transformation. Experimental results show that the proposed symmetric-invariant boundary image matching obtains more accurate and intuitive results than the previous rotation-invariant boundary image matching. These results mean that our symmetric-invariant solution is an excellent approach that solves the image symmetry problem in time-series domain.