• Title/Summary/Keyword: query performance

Search Result 950, Processing Time 0.026 seconds

Development of AI-based Real Time Agent Advisor System on Call Center - Focused on N Bank Call Center (AI기반 콜센터 실시간 상담 도우미 시스템 개발 - N은행 콜센터 사례를 중심으로)

  • Ryu, Ki-Dong;Park, Jong-Pil;Kim, Young-min;Lee, Dong-Hoon;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.750-762
    • /
    • 2019
  • The importance of the call center as a contact point for the enterprise is growing. However, call centers have difficulty with their operating agents due to the agents' lack of knowledge and owing to frequent agent turnover due to downturns in the business, which causes deterioration in the quality of customer service. Therefore, through an N-bank call center case study, we developed a system to reduce the burden of keeping up business knowledge and to improve customer service quality. It is a "real-time agent advisor" system that provides agents with answers to customer questions in real time by combining AI technology for speech recognition, natural language processing, and questions & answers for existing call center information systems, such as a private branch exchange (PBX) and computer telephony integration (CTI). As a result of the case study, we confirmed that the speech recognition system for real-time call analysis and the corpus construction method improves the natural speech processing performance of the query response system. Especially with name entity recognition (NER), the accuracy of the corpus learning improved by 31%. Also, after applying the agent advisor system, the positive feedback rate of agents about the answers from the agent advisor was 93.1%, which proved the system is helpful to the agents.

A Spatio-Temporal Clustering Technique for the Moving Object Path Search (이동 객체 경로 탐색을 위한 시공간 클러스터링 기법)

  • Lee, Ki-Young;Kang, Hong-Koo;Yun, Jae-Kwan;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.7 no.3 s.15
    • /
    • pp.67-81
    • /
    • 2005
  • Recently, the interest and research on the development of new application services such as the Location Based Service and Telemetics providing the emergency service, neighbor information search, and route search according to the development of the Geographic Information System have been increasing. User's search in the spatio-temporal database which is used in the field of Location Based Service or Telemetics usually fixes the current time on the time axis and queries the spatial and aspatial attributes. Thus, if the range of query on the time axis is extensive, it is difficult to efficiently deal with the search operation. For solving this problem, the snapshot, a method to summarize the location data of moving objects, was introduced. However, if the range to store data is wide, more space for storing data is required. And, the snapshot is created even for unnecessary space that is not frequently used for search. Thus, non storage space and memory are generally used in the snapshot method. Therefore, in this paper, we suggests the Hash-based Spatio-Temporal Clustering Algorithm(H-STCA) that extends the two-dimensional spatial hash algorithm used for the spatial clustering in the past to the three-dimensional spatial hash algorithm for overcoming the disadvantages of the snapshot method. And, this paper also suggests the knowledge extraction algorithm to extract the knowledge for the path search of moving objects from the past location data based on the suggested H-STCA algorithm. Moreover, as the results of the performance evaluation, the snapshot clustering method using H-STCA, in the search time, storage structure construction time, optimal path search time, related to the huge amount of moving object data demonstrated the higher performance than the spatio-temporal index methods and the original snapshot method. Especially, for the snapshot clustering method using H-STCA, the more the number of moving objects was increased, the more the performance was improved, as compared to the existing spatio-temporal index methods and the original snapshot method.

  • PDF

A Study on the Intelligent Service Selection Reasoning for Enhanced User Satisfaction : Appliance to Cloud Computing Service (사용자 만족도 향상을 위한 지능형 서비스 선정 방안에 관한 연구 : 클라우드 컴퓨팅 서비스에의 적용)

  • Shin, Dong Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-51
    • /
    • 2012
  • Cloud computing is internet-based computing where computing resources are offered over the Internet as scalable and on-demand services. In particular, in case a number of various cloud services emerge in accordance with development of internet and mobile technology, to select and provide services with which service users satisfy is one of the important issues. Most of previous works show the limitation in the degree of user satisfaction because they are based on so called concept similarity in relation to user requirements or are lack of versatility of user preferences. This paper presents cloud service selection reasoning which can be applied to the general cloud service environments including a variety of computing resource services, not limited to web services. In relation to the service environments, there are two kinds of services: atomic service and composite service. An atomic service consists of service attributes which represent the characteristics of service such as functionality, performance, or specification. A composite service can be created by composition of atomic services and other composite services. Therefore, a composite service inherits attributes of component services. On the other hand, the main participants in providing with cloud services are service users, service suppliers, and service operators. Service suppliers can register services autonomously or in accordance with the strategic collaboration with service operators. Service users submit request queries including service name and requirements to the service management system. The service management system consists of a query processor for processing user queries, a registration manager for service registration, and a selection engine for service selection reasoning. In order to enhance the degree of user satisfaction, our reasoning stands on basis of the degree of conformance to user requirements of service attributes in terms of functionality, performance, and specification of service attributes, instead of concept similarity as in ontology-based reasoning. For this we introduce so called a service attribute graph (SAG) which is generated by considering the inclusion relationship among instances of a service attribute from several perspectives like functionality, performance, and specification. Hence, SAG is a directed graph which shows the inclusion relationships among attribute instances. Since the degree of conformance is very close to the inclusion relationship, we can say the acceptability of services depends on the closeness of inclusion relationship among corresponding attribute instances. That is, the high closeness implies the high acceptability because the degree of closeness reflects the degree of conformance among attributes instances. The degree of closeness is proportional to the path length between two vertex in SAG. The shorter path length means more close inclusion relationship than longer path length, which implies the higher degree of conformance. In addition to acceptability, in this paper, other user preferences such as priority for attributes and mandatary options are reflected for the variety of user requirements. Furthermore, to consider various types of attribute like character, number, and boolean also helps to support the variety of user requirements. Finally, according to service value to price cloud services are rated and recommended to users. One of the significances of this paper is the first try to present a graph-based selection reasoning unlike other works, while considering various user preferences in relation with service attributes.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Job Preference Analysis and Job Matching System Development for the Middle Aged Class (중장년층 일자리 요구사항 분석 및 인력 고용 매칭 시스템 개발)

  • Kim, Seongchan;Jang, Jincheul;Kim, Seong Jung;Chin, Hyojin;Yi, Mun Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.247-264
    • /
    • 2016
  • With the rapid acceleration of low-birth rate and population aging, the employment of the neglected groups of people including the middle aged class is a crucial issue in South Korea. In particular, in the 2010s, the number of the middle aged who want to find a new job after retirement age is significantly increasing with the arrival of the retirement time of the baby boom generation (born 1955-1963). Despite the importance of matching jobs to this emerging middle aged class, private job portals as well as the Korean government do not provide any online job service tailored for them. A gigantic amount of job information is available online; however, the current recruiting systems do not meet the demand of the middle aged class as their primary targets are young workers. We are in dire need of a specially designed recruiting system for the middle aged. Meanwhile, when users are searching the desired occupations on the Worknet website, provided by the Korean Ministry of Employment and Labor, users are experiencing discomfort to search for similar jobs because Worknet is providing filtered search results on the basis of exact matches of a preferred job code. Besides, according to our Worknet data analysis, only about 24% of job seekers had landed on a job position consistent with their initial preferred job code while the rest had landed on a position different from their initial preference. To improve the situation, particularly for the middle aged class, we investigate a soft job matching technique by performing the following: 1) we review a user behavior logs of Worknet, which is a public job recruiting system set up by the Korean government and point out key system design implications for the middle aged. Specifically, we analyze the job postings that include preferential tags for the middle aged in order to disclose what types of jobs are in favor of the middle aged; 2) we develope a new occupation classification scheme for the middle aged, Korea Occupation Classification for the Middle-aged (KOCM), based on the similarity between jobs by reorganizing and modifying a general occupation classification scheme. When viewed from the perspective of job placement, an occupation classification scheme is a way to connect the enterprises and job seekers and a basic mechanism for job placement. The key features of KOCM include establishing the Simple Labor category, which is the most requested category by enterprises; and 3) we design MOMA (Middle-aged Occupation Matching Algorithm), which is a hybrid job matching algorithm comprising constraint-based reasoning and case-based reasoning. MOMA incorporates KOCM to expand query to search similar jobs in the database. MOMA utilizes cosine similarity between user requirement and job posting to rank a set of postings in terms of preferred job code, salary, distance, and job type. The developed system using MOMA demonstrates about 20 times of improvement over the hard matching performance. In implementing the algorithm for a web-based application of recruiting system for the middle aged, we also considered the usability issue of making the system easier to use, which is especially important for this particular class of users. That is, we wanted to improve the usability of the system during the job search process for the middle aged users by asking to enter only a few simple and core pieces of information such as preferred job (job code), salary, and (allowable) distance to the working place, enabling the middle aged to find a job suitable to their needs efficiently. The Web site implemented with MOMA should be able to contribute to improving job search of the middle aged class. We also expect the overall approach to be applicable to other groups of people for the improvement of job matching results.

Ontology-Based Process-Oriented Knowledge Map Enabling Referential Navigation between Knowledge (지식 간 상호참조적 네비게이션이 가능한 온톨로지 기반 프로세스 중심 지식지도)

  • Yoo, Kee-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.61-83
    • /
    • 2012
  • A knowledge map describes the network of related knowledge into the form of a diagram, and therefore underpins the structure of knowledge categorizing and archiving by defining the relationship of the referential navigation between knowledge. The referential navigation between knowledge means the relationship of cross-referencing exhibited when a piece of knowledge is utilized by a user. To understand the contents of the knowledge, a user usually requires additionally information or knowledge related with each other in the relation of cause and effect. This relation can be expanded as the effective connection between knowledge increases, and finally forms the network of knowledge. A network display of knowledge using nodes and links to arrange and to represent the relationship between concepts can provide a more complex knowledge structure than a hierarchical display. Moreover, it can facilitate a user to infer through the links shown on the network. For this reason, building a knowledge map based on the ontology technology has been emphasized to formally as well as objectively describe the knowledge and its relationships. As the necessity to build a knowledge map based on the structure of the ontology has been emphasized, not a few researches have been proposed to fulfill the needs. However, most of those researches to apply the ontology to build the knowledge map just focused on formally expressing knowledge and its relationships with other knowledge to promote the possibility of knowledge reuse. Although many types of knowledge maps based on the structure of the ontology were proposed, no researches have tried to design and implement the referential navigation-enabled knowledge map. This paper addresses a methodology to build the ontology-based knowledge map enabling the referential navigation between knowledge. The ontology-based knowledge map resulted from the proposed methodology can not only express the referential navigation between knowledge but also infer additional relationships among knowledge based on the referential relationships. The most highlighted benefits that can be delivered by applying the ontology technology to the knowledge map include; formal expression about knowledge and its relationships with others, automatic identification of the knowledge network based on the function of self-inference on the referential relationships, and automatic expansion of the knowledge-base designed to categorize and store knowledge according to the network between knowledge. To enable the referential navigation between knowledge included in the knowledge map, and therefore to form the knowledge map in the format of a network, the ontology must describe knowledge according to the relation with the process and task. A process is composed of component tasks, while a task is activated after any required knowledge is inputted. Since the relation of cause and effect between knowledge can be inherently determined by the sequence of tasks, the referential relationship between knowledge can be circuitously implemented if the knowledge is modeled to be one of input or output of each task. To describe the knowledge with respect to related process and task, the Protege-OWL, an editor that enables users to build ontologies for the Semantic Web, is used. An OWL ontology-based knowledge map includes descriptions of classes (process, task, and knowledge), properties (relationships between process and task, task and knowledge), and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. Therefore a knowledge network can be automatically formulated based on the defined relationships, and the referential navigation between knowledge is enabled. To verify the validity of the proposed concepts, two real business process-oriented knowledge maps are exemplified: the knowledge map of the process of 'Business Trip Application' and 'Purchase Management'. By applying the 'DL-Query' provided by the Protege-OWL as a plug-in module, the performance of the implemented ontology-based knowledge map has been examined. Two kinds of queries to check whether the knowledge is networked with respect to the referential relations as well as the ontology-based knowledge network can infer further facts that are not literally described were tested. The test results show that not only the referential navigation between knowledge has been correctly realized, but also the additional inference has been accurately performed.

Application of MicroPACS Using the Open Source (Open Source를 이용한 MicroPACS의 구성과 활용)

  • You, Yeon-Wook;Kim, Yong-Keun;Kim, Yeong-Seok;Won, Woo-Jae;Kim, Tae-Sung;Kim, Seok-Ki
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.13 no.1
    • /
    • pp.51-56
    • /
    • 2009
  • Purpose: Recently, most hospitals are introducing the PACS system and use of the system continues to expand. But small-scaled PACS called MicroPACS has already been in use through open source programs. The aim of this study is to prove utility of operating a MicroPACS, as a substitute back-up device for conventional storage media like CDs and DVDs, in addition to the full-PACS already in use. This study contains the way of setting up a MicroPACS with open source programs and assessment of its storage capability, stability, compatibility and performance of operations such as "retrieve", "query". Materials and Methods: 1. To start with, we searched open source software to correspond with the following standards to establish MicroPACS, (1) It must be available in Windows Operating System. (2) It must be free ware. (3) It must be compatible with PET/CT scanner. (4) It must be easy to use. (5) It must not be limited of storage capacity. (6) It must have DICOM supporting. 2. (1) To evaluate availability of data storage, we compared the time spent to back up data in the open source software with the optical discs (CDs and DVD-RAMs), and we also compared the time needed to retrieve data with the system and with optical discs respectively. (2) To estimate work efficiency, we measured the time spent to find data in CDs, DVD-RAMs and MicroPACS. 7 technologists participated in this study. 3. In order to evaluate stability of the software, we examined whether there is a data loss during the system is maintained for a year. Comparison object; How many errors occurred in randomly selected data of 500 CDs. Result: 1. We chose the Conquest DICOM Server among 11 open source software used MySQL as a database management system. 2. (1) Comparison of back up and retrieval time (min) showed the result of the following: DVD-RAM (5.13,2.26)/Conquest DICOM Server (1.49,1.19) by GE DSTE (p<0.001), CD (6.12,3.61)/Conquest (0.82,2.23) by GE DLS (p<0.001), CD (5.88,3.25)/Conquest (1.05,2.06) by SIEMENS. (2) The wasted time (sec) to find some data is as follows: CD ($156{\pm}46$), DVD-RAM ($115{\pm}21$) and Conquest DICOM Server ($13{\pm}6$). 3. There was no data loss (0%) for a year and it was stored 12741 PET/CT studies in 1.81 TB memory. In case of CDs, On the other hand, 14 errors among 500 CDs (2.8%) is generated. Conclusions: We found that MicroPACS could be set up with the open source software and its performance was excellent. The system built with open source proved more efficient and more robust than back-up process using CDs or DVD-RAMs. We believe that the operation of the MicroPACS would be effective data storage device as long as its operators develop and systematize it.

  • PDF

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.