• Title/Summary/Keyword: Text Collection

Search Result 298, Processing Time 0.027 seconds

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.

Automatic In-Text Keyword Tagging based on Information Retrieval

  • Kim, Jin-Suk;Jin, Du-Seok;Kim, Kwang-Young;Choe, Ho-Seop
    • Journal of Information Processing Systems
    • /
    • v.5 no.3
    • /
    • pp.159-166
    • /
    • 2009
  • As shown in Wikipedia, tagging or cross-linking through major keywords in a document collection improves not only the readability of documents but also responsive and adaptive navigation among related documents. In recent years, the Semantic Web has increased the importance of social tagging as a key feature of the Web 2.0 and, as its crucial phenotype, Tag Cloud has emerged to the public. In this paper we provide an efficient method of automated in-text keyword tagging based on large-scale controlled term collection or keyword dictionary, where the computational complexity of O(mN) - if a pattern matching algorithm is used - can be reduced to O(mlogN) - if an Information Retrieval technique is adopted - while m is the length of target document and N is the total number of candidate terms to be tagged. The result shows that automatic in-text tagging with keywords filtered by Information Retrieval speeds up to about 6 $\sim$ 40 times compared with the fastest pattern matching algorithm.

Big Data Analysis of News on Purchasing Second-hand Clothing and Second-hand Luxury Goods: Identification of Social Perception and Current Situation Using Text Mining (중고의류와 중고명품 구매 관련 언론 보도 빅데이터 분석: 텍스트마이닝을 활용한 사회적 인식과 현황 파악)

  • Hwa-Sook Yoo
    • Human Ecology Research
    • /
    • v.61 no.4
    • /
    • pp.687-707
    • /
    • 2023
  • This study was conducted to obtain useful information on the development of the future second-hand fashion market by obtaining information on the current situation through unstructured text data distributed as news articles related to 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods'. Text-based unstructured data was collected on a daily basis from Naver news from January 1st to December 31st, 2022, using 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods' as collection keywords. This was analyzed using text mining, and the results are as follows. First, looking at the frequency, the collection data related to the purchase of second-hand luxury goods almost quadrupled compared to the data related to the purchase of second-hand clothing, indicating that the purchase of second-hand luxury goods is receiving more social attention. Second, there were common words between the data obtained by the two collection keywords, but they had different words. Regarding second-hand clothing, words related to donations, sharing, and compensation sales were mainly mentioned, indicating that the purchase of second-hand clothing tends to be recognized as an eco-friendly transaction. In second-hand luxury goods, resale and genuine controversy related to the transaction of second-hand luxury goods, second-hand trading platforms, and luxury brands were frequently mentioned. Third, as a result of clustering, data related to the purchase of second-hand clothing were divided into five groups, and data related to the purchase of second-hand luxury goods were divided into six groups.

A Quantitative Approach to a Similarity Analysis on the Culinary Manuscripts in the Chosun Periods (계량적 접근에 의한 조선시대 필사본 조리서의 유사성 분석)

  • Lee, Ki-Hwang;Lee, Jae-Yun;Paek, Doo-Hyun
    • Language and Information
    • /
    • v.14 no.2
    • /
    • pp.131-157
    • /
    • 2010
  • This article reports an attempt to perform a similarity analysis on a collection of 25 culinary manuscripts in Chosun periods using a set of quantitative text analysis methods. Historical culinary texts are valuable resources for linguistic, historic, and cultural studies. We consider the similarity of two texts as the distributional similarities of the functional components of the texts. In the case of culinary texts, text elements such as food names, cooking methods, and ingredients are regarded as functional components. We derive the similarity information from the distributional characteristics of the two key functional components, cooking methods and ingredients. The results are also quantified and visualized to achieve a better understanding of the properties of the individual texts and the collection of the texts as a whole.

  • PDF

Development of technology to improve information accessibility of information vulnerable class using crawling & clipping

  • Jeong, Seong-Bae;Kim, Kyung-Shin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.2
    • /
    • pp.99-107
    • /
    • 2018
  • This study started from the public interest purpose to help accessibility for the information acquisition of the vulnerable groups due to visual difficulties such as the elderly and the visually impaired. In this study, the server resources are minimized and implemented in most of the user smart phones. In addition, we implement a method to gather necessary information by collecting only pattern information by utilizing crawl & clipping without having to visit the site of the information of the various sites having the data necessary for the user, and to have it in the server. Especially, we applied the TTS(Text-To-Speech) service composed of smart phone apps and tried to develop a unified customized information collection service based on voice-based information collection method.