• Title/Summary/Keyword: tag indexing

Search Result 29, Processing Time 0.023 seconds

Greedy Document Gathering Method Using Links and Clustering (Link와 Clustering을 이용한 적극적 문서 수집 기법)

  • 김원우;변영태
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.393-398
    • /
    • 2001
  • 특정 영역에 대해 사용자에게 관련 정보를 제공해 주는 서비스를 하는 정보 에이전트를 개발 중이다. 정보 에이전트는 사용자 질의 처리를 달은 Agent Manager와 지식베이스를 관리하는 KB Manager, 그리고 Web으로부터 해당 영역의 관련 문서를 끌어오는 Web Manager로 구성되어 있다. Web Manager는 방문할 URL을 수집하고, 이들 문서에 대한 관련 평가와 Indexing을 수행한다. Web Manager는 검색 엔진을 이용하거나, 방문한 문서의 link를 이용하여 URL을 수집하는데 이러한 URL수집기법은 많은 관련 문서를 놓치는 문제점이 있다. 이 문제점을 해결하기 위해서 해당 영역과 관련된 Site들을 대상으로 Link를 이용해 문서들을 모아와, 문서들을 TAG들의 패턴으로 얻어낸 문서 형식을 이용해 Clustering하며 관련 문서들의 Group을 찾아내는 적극적 문서 수집 기법을 제안한다. 실험 결과, Link와 Clustering을 이용할 경우 기존보다 효과적으로 관련 문서를 많이 수집할 수 있음을 알 수 있다.

  • PDF

A Study on the online of PDF Electronic Documents System (인터넷 원거리출판의 응용과 PDF의 인쇄활용에 관한 연구)

  • 유영수;강영립;김병현;이광수
    • Proceedings of the Korean Printing Society Conference
    • /
    • 2001.06a
    • /
    • pp.63-77
    • /
    • 2001
  • PDF(Portable Document Format) is a file format that Adobe advances postscritp technique and use in managing document information or electric publishing(internet, CD-ROM, DVD). PDF is a devised document type for being able to read and print anywhere, independent of OS, printer type, resolution, and the kind of computer etc. Because this includes a compressing function, it transfers document through a small size of file in internet or intranet. In addition, that is a file format has various advantages-sharing of information and transfering documents in on line or off line environment. In this paper, we developed electronic document system using PDF format. Electronic document system consists of filter, automatic indexing, special searching system and web server. The information used in this paper is database made using Zwon\`s DocuCom. The filter recognizes various kinds of document structure. And according to property of document, it produces ASCII output. In addition to processing various formats of document, the filter can extract keywords in documents of MS WORD, Excel, Powerpoint, PDF, CAD etc. This filter uses the structure of window printer drive and can extract the information for text, page, font type and size from relevant document. The automatic indexing recognizes the formatted tag of document form ASCII text produced by filter and extracts adequate keyword to structure and property of document. PDF electronic document systems proposed in this paper can be used in Internet, PC communication. Users can choose and read electronic documents by two ways. First, users can choose and read relevant books using PDF electronic document homepage. Second, users can use PDF integrated-search system. User can search after inputing keyword and choose reference field and type of data. But, now, PDF products of Adobe can\`t support the Korean character. If this problem is resolved, we thick that PDF applications system looks active. Although there is limited function in case of using Zwon DocuCom used in this study, we think that there isn\`t a great deal of difficulty in electronic document and building digital database.

  • PDF

Content Description on a Mobile Image Sharing Service: Hashtags on Instagram

  • Dorsch, Isabelle
    • Journal of Information Science Theory and Practice
    • /
    • v.6 no.2
    • /
    • pp.46-61
    • /
    • 2018
  • The mobile social networking application Instagram is a well-known platform for sharing photos and videos. Since it is folksonomy-oriented, it provides the possibility for image indexing and knowledge representation through the assignment of hashtags to posted content. The purpose of this study is to analyze how Instagram users tag their pictures regarding different kinds of picture and hashtag categories. For such a content analysis, a distinction is made between Food, Pets, Selfies, Friends, Activity, Art, Fashion, Quotes (captioned photos), Landscape, and Architecture image categories as well as Content-relatedness (ofness, aboutness, and iconology), Emotiveness, Isness, Performativeness, Fakeness, "Insta"-Tags, and Sentences as hashtag categories. Altogether, 14,649 hashtags of 1,000 Instagram images were intellectually analyzed (100 pictures for each image category). Research questions are stated as follows: RQ1: Are there any differences in relative frequencies of hashtags in the picture categories? On average the number of hashtags per picture is 15. Lowest average values received the categories Selfie (average 10.9 tags per picture) and Friends (average 11.7 tags per picture); for highest, the categories Pet (average 18.6 tags), Fashion (average 17.6 tags), and Landscape (average 16.8 tags). RQ2: Given a picture category, what is the distribution of hashtag categories; and given a hashtag category, what is the distribution of picture categories? 60.20% of all hashtags were classified into the category Content-relatedness. Categories Emotiveness (about 4.38%) and Sentences (0.99%) were less often frequent. RQ3: Is there any association between image categories and hashtag categories? A statistically significant association between hashtag categories and image categories on Instagram exists, as a chi-square test of independence shows. This study enables a first broad overview on the tagging behavior of Instagram users and is not limited to a specific hashtag or picture motive, like previous studies.

Investigating the End-User Tagging Behavior and its Implications in Flickr (플리커 이미지 자료에 대한 이용자 태깅 행태 분석과 활용 방안)

  • Kim, Hyun-Hee;Kim, Min-Kyung
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.71-94
    • /
    • 2009
  • Indexing images using traditional indexing methods like taxonomy is not always efficient because of its visual content. This study examined how to apply folksonomies to image retrieval. To do this, first, we developed a category model for image tags found in Flickr. The model includes five categories and seventeen subcategories. Second, in order to evaluate the usefulness of the model to represent the various image tags as well as to investigate the end-user tagging behavior, three researchers classified the sampled image tags(141 most popular tags, 105 tags on three individual tag clouds and 3,848 image tags assigned on 156 images) according to the model. Finally, based on the research results, we proposed three methods for efficient image retrieval: extending folksonomies by combining them with ontologies; improving image retrieval efficiency using visual content and folksonomies; and updating taxonomy using folksonomies.

Range Stabbing Technique for Continuous Queries on RFID Streaming Data) (RFID 스트리밍 데이타의 연속질의를 위한 영역 스태빙 기법)

  • Park, Jae-Kwan;Hong, Bong-Hee;Lee, Ki-Han
    • Journal of KIISE:Databases
    • /
    • v.36 no.2
    • /
    • pp.112-122
    • /
    • 2009
  • The EPCglobal leading the development in RFID standards proposed Event Cycle Specification (ECSpec) and Event Cycle Reports (ECReports) for the standard about RFID middleware interface. ECSpec is a specification for filtering and collecting RFID tag data and is treated as a Continuous Query (CQ) processed during fixed time intervals repeatedly. ECReport is a specification for describing the results after ECSpec is processed. Thus, it is efficient to apply Query Indexing technique designed for the continuous query processing. This query index processes ECSpecs as data and tag events as queries for efficiency. In logistics environment, the similar or same products are transferred together. Also, when RFID tags attached to the products are acquired, the acquisition events occur massively for the short period. For these properties, it is inefficient to process the massive events one by one. In this paper, we propose a technique reducing similar search process by considering tag events which are collected by the report period in ECSpec, as a range query. For this group processing, we suggest a queuing method for collecting tag events efficiently and a structure for generating range queries in the queues. The experiments show that performance is enhanced by the proposed methods.

A Query Indexing Method for Filtering Event Data in RFID Middleware Systems (RFID 미들웨어에서 이벤트 필터링을 위한 질의 색인 기법)

  • Seok, Su-Wook;Park, Jae-Kwan;Hong, Bong-Hee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.19-21
    • /
    • 2005
  • EPCglobal은 RFID와 관련된 다양한 분야의 표준화를 주도하고 있으며 응용 표준으로써 Tag 정보의 운용을 위한 미들웨어 표준인 ALE Specification을 제시하였다. ALE의 ECSpec은 애플리케이션이 미들웨어에 등록하는 이벤트 필터링을 위한 스펙으로써 일정 시간동안 반복적으로 수행되는 연속 질의와 유사한 특성을 가진다. ECSpec을 연속질의로 변환할 때 해당질의의 WHERE절이 가지는 Predicate는 매우 긴 길이를 가지는 Long Interval이 된다. 이러한 특성은 기존의 질의 색인들의 삽입과 검색 성능을 저하시키는 문제점을 가진다. 이 논문에서는 ECSpec을 연속 질의의 형태로 변환하고 해당 질의가 기지는 Predicate인 2D Interval의 특성을 반영한 새로운 질의 색인 구조로써 TLC-Index를 제안한다. 색인 구조는 그리드 방식의 큰 크기를 가지는 셀 분할 구조와 선분 모양의 가상 분할 구조를 병행하는 하이브리드 구조이다. 색인에서 Long Interval의 정의는 셀 분할 구조의 길이보다. 크거나 같은 길이를 가지는 interval이다. 제안하는 색인은 Long Interval을 큰 크기를 가지는 셀 분할 구조로 분할 삽입함으로써 저장 공간의 소모를 줄이고 삽입 성능을 향상시킨다. 또한 Short Interval들을 짧은 길이를 가지는 가상 분할 구조들로 분할 삽입함으로써 그리드 방식이 가질 수 있는 부분적 겹침을 제거하여 검색 성능을 향상시킨다.

  • PDF

A Study on Creation and Development of Folksonomy Tags on LibraryThing (폭소노미 태그의 생성과 성장에 관한 연구 - LibraryThing을 중심으로 -)

  • Kim, Dong-Suk;Chung, Yeon-Kyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.203-230
    • /
    • 2010
  • This study analyzed the development and growth of folksonomy by examining tags associated with 40 bestsellers on LibraryThing.com in 6-month intervals. It was found that tag values do not decrease but grow in terms of quantity and quality. Accordingly, we examined the major significances of the tags and their potential utilization as an expression of subjects. Our findings were as follows. First, the motivations for tagging can be categorized into personal information for search purposes, self-fulfillment such as sense of achievement, display of emotion and sharing of one's experience with others, or an altruistic objective that emphasizes sociality with a desire that one's actions might provide social benefits. According to our analysis, 74.12% of tags had a social motivation. Second, the total number of tags and the frequency of usage increased with time. Third, the categories that showed a high increase in tag usage were dates of publication and reading, key words, main characters, and book reviews. Tags related to subjects had the highest ratio. Fourth, among Library of Congress Subject Headings (LCSH), multiple genres, key words and main characters were assigned to books, and specific key words and other properties were added as time progressed. There was also a slight increase in the number of tags consistent with LCSH. Fifth, we found that key tags could serve as a compilation of terms that reflects the knowledge base of the corresponding era. Thus, folksonomy should be continuously monitored for its quantitative and qualitative development of the tags to make improvements on its formative disadvantages, and identify internal semantic significance, be actively utilized in conjunction with taxonomy as a flexible compilation of terms that incorporate the history of a specific era.

Effective Streaming of XML Data for Wireless Broadcasting (무선 방송을 위한 효과적인 XML 스트리밍)

  • Park, Jun-Pyo;Park, Chang-Sup;Chung, Yon-Dohn
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.50-62
    • /
    • 2009
  • In wireless and mobile environments, data broadcasting is recognized as an effective way for data dissemination due to its benefits to bandwidth efficiency, energy-efficiency, and scalability. In this paper, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of the mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data which enable energy and latency-efficient broadcast of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains tag name, attributes, and text content of an element as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the wireless stream in a shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing over the stream in the mobile clients. Through extensive performance experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.