• Title/Summary/Keyword: Library 3.0

Search Result 743, Processing Time 0.023 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Emoticon by Emotions: The Development of an Emoticon Recommendation System Based on Consumer Emotions (Emoticon by Emotions: 소비자 감성 기반 이모티콘 추천 시스템 개발)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.227-252
    • /
    • 2018
  • The evolution of instant communication has mirrored the development of the Internet and messenger applications are among the most representative manifestations of instant communication technologies. In messenger applications, senders use emoticons to supplement the emotions conveyed in the text of their messages. The fact that communication via messenger applications is not face-to-face makes it difficult for senders to communicate their emotions to message recipients. Emoticons have long been used as symbols that indicate the moods of speakers. However, at present, emoticon-use is evolving into a means of conveying the psychological states of consumers who want to express individual characteristics and personality quirks while communicating their emotions to others. The fact that companies like KakaoTalk, Line, Apple, etc. have begun conducting emoticon business and sales of related content are expected to gradually increase testifies to the significance of this phenomenon. Nevertheless, despite the development of emoticons themselves and the growth of the emoticon market, no suitable emoticon recommendation system has yet been developed. Even KakaoTalk, a messenger application that commands more than 90% of domestic market share in South Korea, just grouped in to popularity, most recent, or brief category. This means consumers face the inconvenience of constantly scrolling around to locate the emoticons they want. The creation of an emoticon recommendation system would improve consumer convenience and satisfaction and increase the sales revenue of companies the sell emoticons. To recommend appropriate emoticons, it is necessary to quantify the emotions that the consumer sees and emotions. Such quantification will enable us to analyze the characteristics and emotions felt by consumers who used similar emoticons, which, in turn, will facilitate our emoticon recommendations for consumers. One way to quantify emoticons use is metadata-ization. Metadata-ization is a means of structuring or organizing unstructured and semi-structured data to extract meaning. By structuring unstructured emoticon data through metadata-ization, we can easily classify emoticons based on the emotions consumers want to express. To determine emoticons' precise emotions, we had to consider sub-detail expressions-not only the seven common emotional adjectives but also the metaphorical expressions that appear only in South Korean proved by previous studies related to emotion focusing on the emoticon's characteristics. We therefore collected the sub-detail expressions of emotion based on the "Shape", "Color" and "Adumbration". Moreover, to design a highly accurate recommendation system, we considered both emotion-technical indexes and emoticon-emotional indexes. We then identified 14 features of emoticon-technical indexes and selected 36 emotional adjectives. The 36 emotional adjectives consisted of contrasting adjectives, which we reduced to 18, and we measured the 18 emotional adjectives using 40 emoticon sets randomly selected from the top-ranked emoticons in the KakaoTalk shop. We surveyed 277 consumers in their mid-twenties who had experience purchasing emoticons; we recruited them online and asked them to evaluate five different emoticon sets. After data acquisition, we conducted a factor analysis of emoticon-emotional factors. We extracted four factors that we named "Comic", Softness", "Modernity" and "Transparency". We analyzed both the relationship between indexes and consumer attitude and the relationship between emoticon-technical indexes and emoticon-emotional factors. Through this process, we confirmed that the emoticon-technical indexes did not directly affect consumer attitudes but had a mediating effect on consumer attitudes through emoticon-emotional factors. The results of the analysis revealed the mechanism consumers use to evaluate emoticons; the results also showed that consumers' emoticon-technical indexes affected emoticon-emotional factors and that the emoticon-emotional factors affected consumer satisfaction. We therefore designed the emoticon recommendation system using only four emoticon-emotional factors; we created a recommendation method to calculate the Euclidean distance from each factors' emotion. In an attempt to increase the accuracy of the emoticon recommendation system, we compared the emotional patterns of selected emoticons with the recommended emoticons. The emotional patterns corresponded in principle. We verified the emoticon recommendation system by testing prediction accuracy; the predictions were 81.02% accurate in the first result, 76.64% accurate in the second, and 81.63% accurate in the third. This study developed a methodology that can be used in various fields academically and practically. We expect that the novel emoticon recommendation system we designed will increase emoticon sales for companies who conduct business in this domain and make consumer experiences more convenient. In addition, this study served as an important first step in the development of an intelligent emoticon recommendation system. The emotional factors proposed in this study could be collected in an emotional library that could serve as an emotion index for evaluation when new emoticons are released. Moreover, by combining the accumulated emotional library with company sales data, sales information, and consumer data, companies could develop hybrid recommendation systems that would bolster convenience for consumers and serve as intellectual assets that companies could strategically deploy.

Field Studios of In-situ Aerobic Cometabolism of Chlorinated Aliphatic Hydrocarbons

  • Semprini, Lewts
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2004.04a
    • /
    • pp.3-4
    • /
    • 2004
  • Results will be presented from two field studies that evaluated the in-situ treatment of chlorinated aliphatic hydrocarbons (CAHs) using aerobic cometabolism. In the first study, a cometabolic air sparging (CAS) demonstration was conducted at McClellan Air Force Base (AFB), California, to treat chlorinated aliphatic hydrocarbons (CAHs) in groundwater using propane as the cometabolic substrate. A propane-biostimulated zone was sparged with a propane/air mixture and a control zone was sparged with air alone. Propane-utilizers were effectively stimulated in the saturated zone with repeated intermediate sparging of propane and air. Propane delivery, however, was not uniform, with propane mainly observed in down-gradient observation wells. Trichloroethene (TCE), cis-1, 2-dichloroethene (c-DCE), and dissolved oxygen (DO) concentration levels decreased in proportion with propane usage, with c-DCE decreasing more rapidly than TCE. The more rapid removal of c-DCE indicated biotransformation and not just physical removal by stripping. Propane utilization rates and rates of CAH removal slowed after three to four months of repeated propane additions, which coincided with tile depletion of nitrogen (as nitrate). Ammonia was then added to the propane/air mixture as a nitrogen source. After a six-month period between propane additions, rapid propane-utilization was observed. Nitrate was present due to groundwater flow into the treatment zone and/or by the oxidation of tile previously injected ammonia. In the propane-stimulated zone, c-DCE concentrations decreased below tile detection limit (1 $\mu$g/L), and TCE concentrations ranged from less than 5 $\mu$g/L to 30 $\mu$g/L, representing removals of 90 to 97%. In the air sparged control zone, TCE was removed at only two monitoring locations nearest the sparge-well, to concentrations of 15 $\mu$g/L and 60 $\mu$g/L. The responses indicate that stripping as well as biological treatment were responsible for the removal of contaminants in the biostimulated zone, with biostimulation enhancing removals to lower contaminant levels. As part of that study bacterial population shifts that occurred in the groundwater during CAS and air sparging control were evaluated by length heterogeneity polymerase chain reaction (LH-PCR) fragment analysis. The results showed that an organism(5) that had a fragment size of 385 base pairs (385 bp) was positively correlated with propane removal rates. The 385 bp fragment consisted of up to 83% of the total fragments in the analysis when propane removal rates peaked. A 16S rRNA clone library made from the bacteria sampled in propane sparged groundwater included clones of a TM7 division bacterium that had a 385bp LH-PCR fragment; no other bacterial species with this fragment size were detected. Both propane removal rates and the 385bp LH-PCR fragment decreased as nitrate levels in the groundwater decreased. In the second study the potential for bioaugmentation of a butane culture was evaluated in a series of field tests conducted at the Moffett Field Air Station in California. A butane-utilizing mixed culture that was effective in transforming 1, 1-dichloroethene (1, 1-DCE), 1, 1, 1-trichloroethane (1, 1, 1-TCA), and 1, 1-dichloroethane (1, 1-DCA) was added to the saturated zone at the test site. This mixture of contaminants was evaluated since they are often present as together as the result of 1, 1, 1-TCA contamination and the abiotic and biotic transformation of 1, 1, 1-TCA to 1, 1-DCE and 1, 1-DCA. Model simulations were performed prior to the initiation of the field study. The simulations were performed with a transport code that included processes for in-situ cometabolism, including microbial growth and decay, substrate and oxygen utilization, and the cometabolism of dual contaminants (1, 1-DCE and 1, 1, 1-TCA). Based on the results of detailed kinetic studies with the culture, cometabolic transformation kinetics were incorporated that butane mixed-inhibition on 1, 1-DCE and 1, 1, 1-TCA transformation, and competitive inhibition of 1, 1-DCE and 1, 1, 1-TCA on butane utilization. A transformation capacity term was also included in the model formation that results in cell loss due to contaminant transformation. Parameters for the model simulations were determined independently in kinetic studies with the butane-utilizing culture and through batch microcosm tests with groundwater and aquifer solids from the field test zone with the butane-utilizing culture added. In microcosm tests, the model simulated well the repetitive utilization of butane and cometabolism of 1.1, 1-TCA and 1, 1-DCE, as well as the transformation of 1, 1-DCE as it was repeatedly transformed at increased aqueous concentrations. Model simulations were then performed under the transport conditions of the field test to explore the effects of the bioaugmentation dose and the response of the system to tile biostimulation with alternating pulses of dissolved butane and oxygen in the presence of 1, 1-DCE (50 $\mu$g/L) and 1, 1, 1-TCA (250 $\mu$g/L). A uniform aquifer bioaugmentation dose of 0.5 mg/L of cells resulted in complete utilization of the butane 2-meters downgradient of the injection well within 200-hrs of bioaugmentation and butane addition. 1, 1-DCE was much more rapidly transformed than 1, 1, 1-TCA, and efficient 1, 1, 1-TCA removal occurred only after 1, 1-DCE and butane were decreased in concentration. The simulations demonstrated the strong inhibition of both 1, 1-DCE and butane on 1, 1, 1-TCA transformation, and the more rapid 1, 1-DCE transformation kinetics. Results of tile field demonstration indicated that bioaugmentation was successfully implemented; however it was difficult to maintain effective treatment for long periods of time (50 days or more). The demonstration showed that the bioaugmented experimental leg effectively transformed 1, 1-DCE and 1, 1-DCA, and was somewhat effective in transforming 1, 1, 1-TCA. The indigenous experimental leg treated in the same way as the bioaugmented leg was much less effective in treating the contaminant mixture. The best operating performance was achieved in the bioaugmented leg with about over 90%, 80%, 60 % removal for 1, 1-DCE, 1, 1-DCA, and 1, 1, 1-TCA, respectively. Molecular methods were used to track and enumerate the bioaugmented culture in the test zone. Real Time PCR analysis was used to on enumerate the bioaugmented culture. The results show higher numbers of the bioaugmented microorganisms were present in the treatment zone groundwater when the contaminants were being effective transformed. A decrease in these numbers was associated with a reduction in treatment performance. The results of the field tests indicated that although bioaugmentation can be successfully implemented, competition for the growth substrate (butane) by the indigenous microorganisms likely lead to the decrease in long-term performance.

  • PDF