• Title/Summary/Keyword: Text Collection

Search Result 300, Processing Time 0.024 seconds

A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods (용어 가중치부여 기법을 이용한 로치오 분류기의 성능 향상에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.1
    • /
    • pp.211-233
    • /
    • 2008
  • This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes (tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.

A Study about Characteristics of literature of acupuncture and moxibustion in "Chimgujasaenggyeong(鍼灸資生經)" ("침구자생경(鍼灸資生經)"의 침구 문헌적 특징에 관한 연구)

  • Park, Hyun-Guk;Kim, Ki-Wook
    • Journal of Korean Medical classics
    • /
    • v.21 no.4
    • /
    • pp.61-74
    • /
    • 2008
  • The acupuncture and moxibustion documentary characteristics of "Chimgujasaenggyeong" can be summarized into 6 parts such as the follwing 1. "Chimgujasaenggyeong" was written at about 1180-1195 during the Southern Song period. It contains 7 volumes in all. The acupuncture points and their variations in volume 1 were all directly recorded from Wang-yuil(王維一)'s "Dong-insuhyeolchimgudogyeong(銅人腧穴鍼灸圖經)" and 11 points were added from volumes 99 and 100 of "Taepyeongseonghyebang(太平聖惠方)", bringing the total to 365 points, which was different from the 360 that the people knew. Volume 2 is the actual collection of theses on acupuncture and moxibustion by Wangjipjung(王執中) and shows his unique views on the basic problems of acupuncture and moxibustion such as selection of points[取穴], application of moxas[施灸], aftercare of moxibustion[灸後護理] and acupuncture and moxibustion contraindications[鍼灸禁忌]. Volumes 3${\sim}$7 mostly divide the indications(主治) from "Dong-insuhyeolchimgudogyeong", "Taepyeongseonghyebang", "Cheon-geumyobang(千金要方)" by disease into chapters. 2. Of the remaining editions the 'Cheonryeok Guanggeunseodang Inbon(天曆 廣勤書堂 印本)' of the Won dynasty is the first, and the Jeongtong(正統) new edition is a reprint based on the Cheonryeok(天曆) edition, and the Jeongtong edition reprinted in the 9th year of Guanmun(寬文) of Japan has many missing and wrong characters compared to the original copy. 3. The big letters[大字] under the line in the current editions are all postscripts[按語] of Wangjipjung and the 5 verses quoted from other books that do not have their origin listed and have the qualities of rules for treatment in the first chapter of volume 3 'Heoson(虛損)' were put together by Wang. 4. In the annotations in small print of the current edition of "Jasaenggyeong" there are Wisegeol(衛世傑)'s added annotations in addition to Wangjipjung's original ones. 5. Some of the many medical books quoted by the "Jasaenggyeong" that are from before the Song dynasty have been lost completely and only can be seen here in this important text. 6. The quotations said to be from 'Myeongdanggyeong(明堂經)'(or 'Myeongdang(明堂)', 'Myeong(明)') in "Jasaenggyeong" are directly from volume 77 'Chimgyeong(鍼經)' and volume 100 'Myeongdang' of "Taepyeongseonghyebang" and not another book. The quotes from 'Myeongdang' in accupuncture and moxibustion books after the Song dynasty were directly or indirectly copied from "Jasaenggyeong".

  • PDF

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

An Ethnography of the Concept of Illness by the Elderly (노인의 질병 관념에 관한 문화기술적 연구)

  • Cho, Myoung Ok
    • Korean Journal of Adult Nursing
    • /
    • v.12 no.4
    • /
    • pp.690-705
    • /
    • 2000
  • This ethnography was based on Kleinman's explanatory model of a health care system. It is conducted to make thick discription of illness conception of the elderly in a sociocultural context. The basic assumptions were as follows. 1) A health care system is a cultural system, and as with any other cultural system, it is a system of symbolic meanings anchored in a particular arrangement of social institutions and patterns of interpersonal relationships; 2) In all societies health care activities are more or less interrelated. Therefore, they need to be in a holistic manner as socially organized responses to disease that constitute a special cultural system; health care system; 3) Health and illness experiences are the natural process of disease. Individuals who recognized a for state of health, their family, neighbors, and communities define the state, search for causes of the health problems, and response to it. According by, they proceed to search for healing stratagies. So, understanding of the illness experience is the starting point for health care. The study participants were 12 elders aged 60 or more. The fieldwork was conducted in an agricultural clan village of Namwon city. The data collection and analysis were cyclic, from descriptive observation, domain analysis, focused observation, taxanomic analysis, selected observation, componential analysis, and finally cultural themes were all analysed. Proxemic and text analysis techniques were used according to the characteristics of the data. The data of sociocultural context and descriptive data were collected from 1990 to 1992. Informations on illness concepts were collected during 1994 using focused observation. Data confirming and contrast observations were conducted from 1997 and 1999. Illness concepts of the elderly were taxonomized supernatural cause, non-supernatural cause, immediate cause, and ultimate cause. The supernatural ones were ancestors, god of home, god of village, and ghost such as 'sal(evil force of dead man)' and 'gagqui(ghost of begger)'. The non-supernatural ones were Ki, natural phenomenones, natural objects, foods, human and human behaviors. Immediate ones were insufficiency and overflows, discretion and consolidation, disorder and out of order, cloudness and contamination, and fluctuation and stagnation of supernatural cause and non-supernatural ones. Ultimate causes were intrusion and loss of supernatural and nonsupernatural ones. The cultural themes of illness concepts of the elderly are: 1) illness concepts are not based on causality principle, but on reciprocal principle; 2) illness concepts are affected by social level and charicteristics of the patients; 3) the causes of disease are recognized as imposed both positive and negative effects on health based on interpretation of the indiviuals; 4) illness concepts reflects on principles of everyday life of the society members such as hierachial structure and group cohesiveness; 5) illness concepts are ruled on principle of reciprocity and spread; 6) illness concepts are interrelated with physical environment of the participants. It can be concluded that the illness concepts of the elderly in a traditional clan village are a component of health care system as a cultural system based on these results. The these results can be a useful basis for gerontological nursing practice and education.

  • PDF

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity (문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상)

  • Park Eui-Kyu;Ra Dong-Yul;Jang Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.406-415
    • /
    • 2005
  • Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.

Research on the Usage of Electronic Information Resources of the Humanities Scholars in Korea (인문학자의 전자정보원 이용행태에 관한 연구)

  • Yoon, Cheong-Ok
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.2
    • /
    • pp.5-28
    • /
    • 2009
  • The purpose of this study is to investigate the use of the electronic information resources of humanities scholars in Korea and propose the planning of academic library and information services to serve their needs. To collect data, a postal survey was conducted during the period of November 2007 through January 2008. Out of 799 humanities scholars sampled from 25 universities, 132 responded with a completion rate of 16%. The major findings of this study are as follows: Firstly, the majority of humanities scholars distribute their time equally between research and education, and conduct independent research. Secondly, they use, to a certain degree, electronic information resources largely in text format, and depend upon the electronic collection of their academic libraries. Thirdly, with the exception of a couple of sources of electronic journal resources, the electronic resources that these humanities scholars regularly use vary so widely that none could be considered to be a common resource. Fourthly, they value the convenience of accessing and using electronic resources, but worry about the quality and scope of the contents. It is suggested that academic libraries (1) become the gateway for the electronic information that is available both inside and outside the library, (2) provide integrated search feature for and a 'single sign on' access to electronic resources, and (3) plan customized user education for specific subject fields in the humanities.

A Study on the School Library Research Trends Using Topic Modeling (토픽모델링을 활용한 학교도서관 연구동향 분석)

  • Jung, Young-Joo;Kim, Hea-Jin
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.3
    • /
    • pp.103-121
    • /
    • 2020
  • This study aimed to analyze the research trends of school libraries from 1990 to July 2020. To this end, LDA topic modeling analysis was conducted to the domestic article abstracts related to school libraries. The total number of documents is 498 papers published by the four major domestic journals in Library and Information Science. The log-likelihood estimate criterion was used to determine the number of topics for topic modeling. As a result of the study, 27 topics were discovered, then, theory were categorized by eight subject areas: general, institutional system, building/equipment, operation/management, data organization, service, education, and others. The most popular research was library utilization classes (T27) and Information Utilization (T2). More than 20 studies were found in each evaluation index development (T13), school librarian placement (T24), learning information media utilization (T3), community public library (T7), library cooperation (T9), library use (T17), library research (T11), reading education (T4), collection development (T5), and education effects/teaching methods (T18).

Research for the Element to Analyze the Performance of Modern-Web-Browser Based Applications (모던 웹 브라우저(Modern-Web-Browser) 기반 애플리케이션 성능분석을 위한 요소 연구)

  • Park, Jin-tae;Kim, Hyun-gook;Moon, Il-young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.278-281
    • /
    • 2018
  • The early Web technology was to show text information through a browser. However, as web technology advances, it is possible to show large amounts of multimedia data through browsers. Web technologies are being applied in a variety of fields such as sensor network, hardware control, and data collection and analysis for big data and AI services. As a result, the standard has been prepared for the Internet of Things, which typically controls a sensor via HTTP communication and provides information to users, by installing a web browser on the interface of the Internet of Things. In addition, the recent development of web-assembly enabled 3D objects, virtual/enhancing real-world content that could not be run in web browsers through a native language of C-class. Factors that evaluate the performance of existing Web applications include performance, network resources, and security. However, since there are many areas in which web applications are applied, it is time to revisit and review these factors. In this thesis, we will conduct an analysis of the factors that assess the performance of a web application. We intend to establish an indicator of the development of web-based applications by reviewing the analysis of each element, its main points, and its needs to be supplemented.

  • PDF

Establishment of Strategy for Management of Technology Using Data Mining Technique (데이터 마이닝을 통한 기술경영 전략 수립에 관한 연구)

  • Lee, Junseok;Lee, Joonhyuck;Kim, Gabjo;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.126-132
    • /
    • 2015
  • Technology forecasting is about understanding a status of a specific technology in the future, based on the current data of the technology. It is useful when planning technology management strategies. These days, it is common for countries, companies, and researchers to establish R&D directions and strategies by utilizing experts' opinions. However, this qualitative method of technology forecasting is costly and time consuming since it requires to collect a variety of opinions and analysis from many experts. In order to deal with these limitations, quantitative method of technology forecasting is being studied to secure objective forecast result and help R&D decision making process. This paper suggests a methodology of technology forecasting based on quantitative analysis. The methodology consists of data collection, principal component analysis, and technology forecasting by logistic regression, which is one of the data mining techniques. In this research, patent documents related to autonomous vehicle are collected. Then, the texts from patent documents are extracted by text mining technique to construct an appropriate form for analysis. After principal component analysis, logistic regression is performed by using principal component score. On the basis of this result, it is possible to analyze R&D development situation and technology forecasting.