• Title/Summary/Keyword: Internet tools

Search Result 669, Processing Time 0.026 seconds

Annotation Method based on Face Area for Efficient Interactive Video Authoring (효과적인 인터랙티브 비디오 저작을 위한 얼굴영역 기반의 어노테이션 방법)

  • Yoon, Ui Nyoung;Ga, Myeong Hyeon;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.83-98
    • /
    • 2015
  • Many TV viewers use mainly portal sites in order to retrieve information related to broadcast while watching TV. However retrieving information that people wanted needs a lot of time to retrieve the information because current internet presents too much information which is not required. Consequentially, this process can't satisfy users who want to consume information immediately. Interactive video is being actively investigated to solve this problem. An interactive video provides clickable objects, areas or hotspots to interact with users. When users click object on the interactive video, they can see additional information, related to video, instantly. The following shows the three basic procedures to make an interactive video using interactive video authoring tool: (1) Create an augmented object; (2) Set an object's area and time to be displayed on the video; (3) Set an interactive action which is related to pages or hyperlink; However users who use existing authoring tools such as Popcorn Maker and Zentrick spend a lot of time in step (2). If users use wireWAX then they can save sufficient time to set object's location and time to be displayed because wireWAX uses vision based annotation method. But they need to wait for time to detect and track object. Therefore, it is required to reduce the process time in step (2) using benefits of manual annotation method and vision-based annotation method effectively. This paper proposes a novel annotation method allows annotator to easily annotate based on face area. For proposing new annotation method, this paper presents two steps: pre-processing step and annotation step. The pre-processing is necessary because system detects shots for users who want to find contents of video easily. Pre-processing step is as follow: 1) Extract shots using color histogram based shot boundary detection method from frames of video; 2) Make shot clusters using similarities of shots and aligns as shot sequences; and 3) Detect and track faces from all shots of shot sequence metadata and save into the shot sequence metadata with each shot. After pre-processing, user can annotates object as follow: 1) Annotator selects a shot sequence, and then selects keyframe of shot in the shot sequence; 2) Annotator annotates objects on the relative position of the actor's face on the selected keyframe. Then same objects will be annotated automatically until the end of shot sequence which has detected face area; and 3) User assigns additional information to the annotated object. In addition, this paper designs the feedback model in order to compensate the defects which are wrong aligned shots, wrong detected faces problem and inaccurate location problem might occur after object annotation. Furthermore, users can use interpolation method to interpolate position of objects which is deleted by feedback. After feedback user can save annotated object data to the interactive object metadata. Finally, this paper shows interactive video authoring system implemented for verifying performance of proposed annotation method which uses presented models. In the experiment presents analysis of object annotation time, and user evaluation. First, result of object annotation average time shows our proposed tool is 2 times faster than existing authoring tools for object annotation. Sometimes, annotation time of proposed tool took longer than existing authoring tools, because wrong shots are detected in the pre-processing. The usefulness and convenience of the system were measured through the user evaluation which was aimed at users who have experienced in interactive video authoring system. Recruited 19 experts evaluates of 11 questions which is out of CSUQ(Computer System Usability Questionnaire). CSUQ is designed by IBM for evaluating system. Through the user evaluation, showed that proposed tool is useful for authoring interactive video than about 10% of the other interactive video authoring systems.

A Study on Development and Prospects of Archival Finding Aids (기록 검색도구의 발전과 전망)

  • Seol, Moon-Won
    • The Korean Journal of Archival Studies
    • /
    • no.23
    • /
    • pp.3-43
    • /
    • 2010
  • Finding aids are tools which facilitate to locate and understand archives and records. Traditionally there are two types of archival finding aids: vertical and horizontal. Vertical finding aids such as inventories have multi-level descriptions based on provenance, while horizontal ones such as catalogs and index are tools to guide to the vertical finding aids based on the subject. In the web environment, traditional finding aids are evolving into more dynamic forms. Respecting the principles of provenance and original order, vertical finding aids are changing to multi-entity structures with development of ISAD(G), ISAAR(CPF) and ISDF as standards for describing each entity. However, vertical finding aids can be too difficult, complicated, and boring for many users, who are accustomed to the easy and exciting searching tools in the internet world. Complementing them, new types of finding aids are appearing to provide easy, interesting, and extensive access channels. This study investigates the development and limitation of vertical finding aids, and the recent trend of evolving new finding aids complementing the vertical ones. The study finds three new trends of finding aid development. They are (i) mixture, (ii) integration, and (iii) openness. In recent days, certain finding aids are mixed with stories and others provide integrated searches for the collections of various heritage institutions. There are cases for experimenting user participation in the development of finding aids using Web 2.0 applications. These new types of finding aids can also cause some problems such as decontextualised description and prejudices, especially in the case of mixed finding aids and quality control of user contributed annotations and comments. To solve these problems, the present paper suggests to strengthen the infrastructure of vertical finding aids and to connect them with various new ones and to facilitate interactions with users of finding aids. It is hoped that the present paper will provide impetus for archives including the National Archives of Korea to set up and evaluate the development strategies for archival finding aids.

Analysis of Internet Biology Study Sites and Guidelines for Constructing Educational Homepages (인터넷상의 고등학교 생물 학습사이트 비교분석 및 웹사이트 구축방안에 관한 연구)

  • Kim, Joo-Hyun;Sung, Jung-Hee
    • Journal of The Korean Association For Science Education
    • /
    • v.22 no.4
    • /
    • pp.779-795
    • /
    • 2002
  • Internet, a world wide network of computers, is considered as a sea of information because it allows people to share information beyond the barriors of time and space. However, in spite of the unmeasurable potential applications of the internet, its use in the field of biology education has been extremely limited mainly due to the scarcity of good biology-related sites. In order to provide useful guidelines for constructing user-friendly study sites, which can help high school students with different intellectual levels to study biology, comparative studies were performed on selected educational sites. Initially, hundreds of related sites were examined, and, subsequently, four distinct sites were selected not only because they are well organized, but also because each is unique in its contents. Also, a survey was carried out against the users of each site. The survey results indicated that the high school students regard the web-based biology study tools as effective teaching methods although there might be some bias in criteria for selecting target sites. In addition to the detailed biology topics and the related biology informations, multimedia data including pictures, animations and movies are found to be one of the important ingredients for desirable biology study sites. Thus, the inclusion of multimedia components should also be considered when developing a systematic biology study site. Overall, the role of the cyber space is expected to become more and more important. Since the development of the user-satisfied and self-guided sites require interdisciplinary collaborational efforts which should be made to promote extensive communication among teachers, education professionals, and computer engineers. Furthermore, the introduction of good biology study sites to the students by their teachers is also important factor for the successful web-based education.

A Study on E-mail Campaigns and Feedback Analysis as Marketing Tools of Internet Fashion Shopping Malls - With Focus on Specialized Fashion Shopping Malls - (인터넷 패션쇼핑몰의 이메일 마케팅 활용과 반응 - 패션 전문몰을 중심으로 -)

  • Han, Ji-Sook
    • Archives of design research
    • /
    • v.19 no.2 s.64
    • /
    • pp.53-62
    • /
    • 2006
  • E-mail has indeed developed from 'a means of instant communication' to an indispensable part of online marketing. Therefore, companies need to implement consistent customer management. Communication with customers and marketing through e-mail is a powerful way of communication and adapting one-to-one marketing strategies to customer trends, habits and taste preferences. Since setting accurate targets is especially important in the fashion industry, e-mail marketing is the most effective way to communicate with customers and one-to-one marketing constitutes a very important strategy. In this study, I will analyze this powerful one-on-one marketing tool, particularly actual e-mail messages sent by an Internet Shopping Mall from June 12 to July 30, 2005, examine the effect of these messages on sales growth and analyze actual feedback received. Regarding e-mail read rates broken down by age and gender, 1 found that females in their late twenties recorded the highest rate at 21.66% and their contribution to sales growth was recorded at 3.5% From actual sales records, found that 28.10% of total sales were attributable to people in their late twenties, showing that the age group that reads e-mails the most also buys the most. Regarding feedback by e-mail title, e-mails from the 'Casual' category seemed to be the most effective, in that most of these e-mails were read. Also, messages sent on Tuesdays were read the most, according to the feedback analysis by weekday. Section e-mails were read more often than regular e-mails. Regarding the view rate according to the time e-mails were sent, messages sent to females in their late twenties at two o'clock in the afternoon were read by 20.93% of recipients, recording the highest read rate. By offering informative content and practical tips, visitors will be attracted to the site and generate site traffic. Therefore, we can conclude that sending e-mail messages can greatly contribute to sales growth and e-mail marketing is very effective. Also, in order to make e-mail campaigns more effective and improve marketing results, we need to analyze actual results and apply our findings in future e-mail campaigns. With this, we get successful marketing results.

  • PDF

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

A Study on the Meaning and Strategy of Keyword Advertising Marketing

  • Park, Nam Goo
    • Journal of Distribution Science
    • /
    • v.8 no.3
    • /
    • pp.49-56
    • /
    • 2010
  • At the initial stage of Internet advertising, banner advertising came into fashion. As the Internet developed into a central part of daily lives and the competition in the on-line advertising market was getting fierce, there was not enough space for banner advertising, which rushed to portal sites only. All these factors was responsible for an upsurge in advertising prices. Consequently, the high-cost and low-efficiency problems with banner advertising were raised, which led to an emergence of keyword advertising as a new type of Internet advertising to replace its predecessor. In the beginning of 2000s, when Internet advertising came to be activated, display advertisement including banner advertising dominated the Net. However, display advertising showed signs of gradual decline, and registered minus growth in the year 2009, whereas keyword advertising showed rapid growth and started to outdo display advertising as of the year 2005. Keyword advertising refers to the advertising technique that exposes relevant advertisements on the top of research sites when one searches for a keyword. Instead of exposing advertisements to unspecified individuals like banner advertising, keyword advertising, or targeted advertising technique, shows advertisements only when customers search for a desired keyword so that only highly prospective customers are given a chance to see them. In this context, it is also referred to as search advertising. It is regarded as more aggressive advertising with a high hit rate than previous advertising in that, instead of the seller discovering customers and running an advertisement for them like TV, radios or banner advertising, it exposes advertisements to visiting customers. Keyword advertising makes it possible for a company to seek publicity on line simply by making use of a single word and to achieve a maximum of efficiency at a minimum cost. The strong point of keyword advertising is that customers are allowed to directly contact the products in question through its more efficient advertising when compared to the advertisements of mass media such as TV and radio, etc. The weak point of keyword advertising is that a company should have its advertisement registered on each and every portal site and finds it hard to exercise substantial supervision over its advertisement, there being a possibility of its advertising expenses exceeding its profits. Keyword advertising severs as the most appropriate methods of advertising for the sales and publicity of small and medium enterprises which are in need of a maximum of advertising effect at a low advertising cost. At present, keyword advertising is divided into CPC advertising and CPM advertising. The former is known as the most efficient technique, which is also referred to as advertising based on the meter rate system; A company is supposed to pay for the number of clicks on a searched keyword which users have searched. This is representatively adopted by Overture, Google's Adwords, Naver's Clickchoice, and Daum's Clicks, etc. CPM advertising is dependent upon the flat rate payment system, making a company pay for its advertisement on the basis of the number of exposure, not on the basis of the number of clicks. This method fixes a price for advertisement on the basis of 1,000-time exposure, and is mainly adopted by Naver's Timechoice, Daum's Speciallink, and Nate's Speedup, etc, At present, the CPC method is most frequently adopted. The weak point of the CPC method is that advertising cost can rise through constant clicks from the same IP. If a company makes good use of strategies for maximizing the strong points of keyword advertising and complementing its weak points, it is highly likely to turn its visitors into prospective customers. Accordingly, an advertiser should make an analysis of customers' behavior and approach them in a variety of ways, trying hard to find out what they want. With this in mind, her or she has to put multiple keywords into use when running for ads. When he or she first runs an ad, he or she should first give priority to which keyword to select. The advertiser should consider how many individuals using a search engine will click the keyword in question and how much money he or she has to pay for the advertisement. As the popular keywords that the users of search engines are frequently using are expensive in terms of a unit cost per click, the advertisers without much money for advertising at the initial phrase should pay attention to detailed keywords suitable to their budget. Detailed keywords are also referred to as peripheral keywords or extension keywords, which can be called a combination of major keywords. Most keywords are in the form of texts. The biggest strong point of text-based advertising is that it looks like search results, causing little antipathy to it. But it fails to attract much attention because of the fact that most keyword advertising is in the form of texts. Image-embedded advertising is easy to notice due to images, but it is exposed on the lower part of a web page and regarded as an advertisement, which leads to a low click through rate. However, its strong point is that its prices are lower than those of text-based advertising. If a company owns a logo or a product that is easy enough for people to recognize, the company is well advised to make good use of image-embedded advertising so as to attract Internet users' attention. Advertisers should make an analysis of their logos and examine customers' responses based on the events of sites in question and the composition of products as a vehicle for monitoring their behavior in detail. Besides, keyword advertising allows them to analyze the advertising effects of exposed keywords through the analysis of logos. The logo analysis refers to a close analysis of the current situation of a site by making an analysis of information about visitors on the basis of the analysis of the number of visitors and page view, and that of cookie values. It is in the log files generated through each Web server that a user's IP, used pages, the time when he or she uses it, and cookie values are stored. The log files contain a huge amount of data. As it is almost impossible to make a direct analysis of these log files, one is supposed to make an analysis of them by using solutions for a log analysis. The generic information that can be extracted from tools for each logo analysis includes the number of viewing the total pages, the number of average page view per day, the number of basic page view, the number of page view per visit, the total number of hits, the number of average hits per day, the number of hits per visit, the number of visits, the number of average visits per day, the net number of visitors, average visitors per day, one-time visitors, visitors who have come more than twice, and average using hours, etc. These sites are deemed to be useful for utilizing data for the analysis of the situation and current status of rival companies as well as benchmarking. As keyword advertising exposes advertisements exclusively on search-result pages, competition among advertisers attempting to preoccupy popular keywords is very fierce. Some portal sites keep on giving priority to the existing advertisers, whereas others provide chances to purchase keywords in question to all the advertisers after the advertising contract is over. If an advertiser tries to rely on keywords sensitive to seasons and timeliness in case of sites providing priority to the established advertisers, he or she may as well make a purchase of a vacant place for advertising lest he or she should miss appropriate timing for advertising. However, Naver doesn't provide priority to the existing advertisers as far as all the keyword advertisements are concerned. In this case, one can preoccupy keywords if he or she enters into a contract after confirming the contract period for advertising. This study is designed to take a look at marketing for keyword advertising and to present effective strategies for keyword advertising marketing. At present, the Korean CPC advertising market is virtually monopolized by Overture. Its strong points are that Overture is based on the CPC charging model and that advertisements are registered on the top of the most representative portal sites in Korea. These advantages serve as the most appropriate medium for small and medium enterprises to use. However, the CPC method of Overture has its weak points, too. That is, the CPC method is not the only perfect advertising model among the search advertisements in the on-line market. So it is absolutely necessary that small and medium enterprises including independent shopping malls should complement the weaknesses of the CPC method and make good use of strategies for maximizing its strengths so as to increase their sales and to create a point of contact with customers.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Cloning of a Glutathione S-Transferase Decreasing During Differentiation of HL60 Cell Line (HL6O 세포주의 분화 시 감소 특성을 보이는 Glutathione S-Transferase의 클로닝)

  • Kim Jae Chul;Park In Kyu;Lee Kyu Bo;Sohn Sang Kyun;Kim Moo Kyu;Kim Jung Chul
    • Radiation Oncology Journal
    • /
    • v.17 no.2
    • /
    • pp.151-157
    • /
    • 1999
  • Purpose : By sequencing the Erpressed Sequence Tags of human 걸ermal papilla CDNA library, we identified a clone named K872 of which the expression decreased during differentiation of HL6O cell line. Materials and Methods : K872 plasmid DNA was isolated according to QIA plasmid extraction kit (Qiagen GmbH, Germany). The nucleotide sequencing was performed by Sanger's method with K872 plasmid DNA. The most updated GenBank EMBL necleic acid banks were searched through the internet by using BLAST (Basic Local Alignment Search Tools) program. Nothern bots were performed using RNA isolated from various human tissues and cancer cell lines. The gene expression of the fusion protein was achieved by His-Patch Thiofusicn expression system and the protein product was identified on SDS-PAGE. Results : K872 clone is 1006 nucleotides long, and has a coding region of 675 nucleotides and a 3' non-coding region of 280 nucleotides. The presumed open reading frame starting at the 5' terminus of K872 encodes 226 amino acids, including the initiation methionine residue. The amino acid sequence deduced from the open reading frame of K872 shares $70\%$, identity with that of rat glutathione 5-transferase kappa 1 (rGSTKl). The transcripts were expressed in a variety of human tissues and cancer cells. The levels of transcript were relatively high in those tissues such as heart, skeletal muscle, and peripheral blood leukocyte. It is noteworthy that K872 was found to be abundantly expressed in coloreetal cancer and melanoma cell lines. Conclusion : Homology search result suggests that K872 clone is the human homolog of the rGSTK1 which is known to be involved in the resistance of cytotoxic therapy. We propose that meticulous functional analysis should be followed to confirm that.

  • PDF

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Development of Soil Erosion Analysis Systems Based on Cloud and HyGIS (클라우드 및 HyGIS기반 토양유실분석 시스템 개발)

  • Kim, Joo-Hun;Kim, Kyung-Tak;Lee, Jin-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.4
    • /
    • pp.63-76
    • /
    • 2011
  • This study purposes to develop a model to analyze soil loss in estimating prior disaster influence. The model of analyzing soil loss develops the soil loss analysis system on the basis of Internet by introducing cloud computing system, and also develops a standalone type in connection with HyGIS. The soil loss analysis system is developed to draw a distribution chart without requiring a S/W license as well as without preparing basic data such as DEM, soil map and land cover map. Besides, it can help users to draw a soil loss distribution chart by applying various factors like direct rain factors. The tools of Soil Loss Anaysis Model in connection with HyGiS are developed as add-on type of GMMap2009 in GEOMania, and also are developed to draw Soil Loss Hazard Map suggested by OECD. As a result of using both models, they are developed very conveniently to analyze soil loss. Hereafter, these models will be able to be improved continuously through researches to analyze sediment a watershed outlet and to calculate R value using data of many rain stations.