• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.027 seconds

Analysis of the Unstructured Traffic Report from Traffic Broadcasting Network by Adapting the Text Mining Methodology (텍스트 마이닝을 적용한 한국교통방송제보 비정형데이터의 분석)

  • Roh, You Jin;Bae, Sang Hoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.3
    • /
    • pp.87-97
    • /
    • 2018
  • The traffic accident reports that are generated by the Traffic Broadcasting Networks(TBN) are unstructured data. It, however, has the value as some sort of real-time traffic information generated by the viewpoint of the drives and/or pedestrians that were on the roads, the time and spots, not the offender or the victim who caused the traffic accidents. However, the traffic accident reports, which are big data, were not applied to traffic accident analysis and traffic related research commonly. This study adopting text-mining technique was able to provide a clue for utilizing it for the impacts of traffic accidents. Seven years of traffic reports were grasped by this analysis. By analyzing the reports, it was possible to identify the road names, accident spot names, time, and to identify factors that have the greatest influence on other drivers due to traffic accidents. Authors plan to combine unstructured accident data with traffic reports for further study.

Combining Multiple Sources of Evidence to Enhance Web Search Performance

  • Yang, Kiduk
    • Journal of Korean Library and Information Science Society
    • /
    • v.45 no.3
    • /
    • pp.5-36
    • /
    • 2014
  • The Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and manually classified directories of Web documents such as Yahoo. This research extends past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories. Retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. The analysis of results suggests that the solution spaces of text-, link-, and classification-based retrieval methods are diverse enough for fusion to be beneficial while revealing important characteristics of the fusion environment, such as effects of system parameters and relationship between overlap, document ranking and relevance.

Content Analysis of Webzine for Gist-based Health Message Design (핵심정보 중심 건강 메시지 디자인을 위한 웹진 내용분석)

  • Cho, Young Hoan;Choi, Hyoseon;You, Myoung Soon
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.2
    • /
    • pp.192-204
    • /
    • 2014
  • Gist-based message design is essential in the Internet in which a lot of health messages are created and shared constantly. This study intended to identify the characteristics of health messages in a webzine and explore a way to design gist-based health messages. A total of 72 webzine articles published in Korean Ministry of Food and Drug Safety were selected, and text and visual messages of the articles were analyzed in terms of content types, the frequency and position of gists, and intuitive expression. The articles were also categorized regarding the characteristics of health messages through cluster analyses. This study found that most texts of the health articles consisted of facts and methods, while most visual messages represented concepts. In addition, both text and visual messages had limitations in presenting a gist effectively. It was also necessary to explore an effective way to improve an intuitive understanding of jargon and quantitative information in health messages. Based on these findings, this study provided suggestions for the design of gist-based health messages in the Internet.

A Study on the Extraction and Utilization of Index from Bibliographic MARC Database (서지마크 데이터베이스로부터의 색인어 추출과 색인어의 검색 활용에 관한 연구 - 경북대학교 도서관 학술정보시스템 사례를 중심으로 -)

  • Park Mi-Sung
    • Journal of Korean Library and Information Science Society
    • /
    • v.36 no.2
    • /
    • pp.327-348
    • /
    • 2005
  • The purpose of this study is to emphasize the importance of index definition and to prepare the basis of optimal index in bibliographic retrieval system. For the purpose, this research studied a index extraction theory on index tag definition and index normalization from the bibliographic marc database and analyzed a retrieval utilization rate of extracted index. In this experiment, we divided index between text-type and code-type about the generated 29,219,853 indexes from 2,200,488 bibliographic records and analyzed utilization rate by the comparison of index-type and index term of web logs. According to the result, the text-type indexes such as title, author, publication, subject are showed high utilization rate while the code-type indexes were showed low utilization rate. So this study suggests that the unused index is removed from index definition to optimize index.

  • PDF

A Semiotic In-depth Analysis of the Children's Picture Book : "Who Pooped on My Head?" (그림책 "누가 내 머리에 똥 쌌어?" 의 심층의미 분석 - Greimas기호학을 중심으로 -)

  • Chang, Sung Dan;Lee, Ju Lie
    • Korean Journal of Childcare and Education
    • /
    • v.9 no.2
    • /
    • pp.263-283
    • /
    • 2013
  • This study analyzed picture book "Who Pooped on my Head?" by using semiotic theories and considered involved meaning of the text. Because characteristics of children's development and picture book should be in harmony, it is necessary to understand the text in-depth. As a result of analysis, the text helps psychological stability and contains the meaning and value that guides children into social value or norm. Analyzing the text by using Greimas' Semiotic square, the meaning that following bowel movement standard is socially established behavior was involved. It seems that giving up instinct and accepting standard are process to extend thought. In conclusion, this study confirmed that the meaning and value that support cognitive and emotional development was contained in the picture book, "Who Pooped on my Head?".

Examining the Intellectual Structure of Records Management & Archival Science in Korea with Text Mining (텍스트 마이닝을 이용한 국내 기록관리학 분야 지적구조 분석)

  • Lee, Jae-Yun;Moon, Ju-Young;Kim, Hee-Jung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.41 no.1
    • /
    • pp.345-372
    • /
    • 2007
  • In this study, the intellectual structure of Records Management & Archival Science in Korea was analyzed using document clustering, a widely used method of text mining, and document similarity network analysis. The data used in this study were 145 articles written on the subject of Records Management & Archival Science selected from five major representative journals in the field of Library & Information Science in Korea, published from 2001 to 2006. The results of cluster analysis show that the core subject areas are "electronic records management and digital Preservation," "records management policy and institution," "records description and catalogues." and "records management domain and education." The results of document analysis, which is more detailed than cluster analysis, show that "digital archiving," a specialized subject in digital preservation, plays a central role. The results of serial analysis, which proceeds according to a timeline, show the emergence of "archival services" as a new subject area.

BERT-based Classification Model for Korean Documents (한국어 기술문서 분석을 위한 BERT 기반의 분류모델)

  • Hwang, Sangheum;Kim, Dohyun
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.203-214
    • /
    • 2020
  • It is necessary to classify technical documents such as patents, R&D project reports in order to understand the trends of technology convergence and interdisciplinary joint research, technology development and so on. Text mining techniques have been mainly used to classify these technical documents. However, in the case of classifying technical documents by text mining algorithms, there is a disadvantage that the features representing technical documents must be directly extracted. In this study, we propose a BERT-based document classification model to automatically extract document features from text information of national R&D projects and to classify them. Then, we verify the applicability and performance of the proposed model for classifying documents.

Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm (텍스트 마이닝 기법을 활용한 기후변화관련 식품분야 논문초록 분석)

  • Bae, Kyu Yong;Park, Ju-Hyun;Kim, Jeong Seon;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1429-1437
    • /
    • 2013
  • Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.

A Study on an Automatic Summarization System Using Verb-Based Sentence Patterns (술어기반 문형정보를 이용한 자동요약시스템에 관한 연구)

  • 최인숙;정영미
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.4
    • /
    • pp.37-55
    • /
    • 2001
  • The purpose of this study is to present a text summarization system using a knowledge base containing information about verbs and their arguments that are statistically obtained from a subject domain. The system consists of two modules: the training module and the summarization module. The training module is to extract cue verbs and their basic sentence patterns by counting the frequency of verbs and case markers respectively, and the summarization module is substantiate basic sentence patterns and to generate summaries. Basic sentence patterns are substantiated by applying substantiation rules to the syntactics structure of sentences. A summary is then produced by connecting simple sentences that the are generated through the substantiation module of basic sentence patterns. ‘robbery’in the daily newspapers are selected for a test collection. The system generates natural summaries without losing any essential information by combining both cue verbs and essential arguments. In addition, the use of statistical techniques makes it possible to apply this system to other subject domains through its learning capability.

  • PDF

Virtual Culture Production of Game-Focus on 'Machinima' (게임의 가상문화생산 - '머시니마'를 중심으로)

  • Han, Se-Young;Sung, Jung-Hwan
    • Journal of Korea Game Society
    • /
    • v.10 no.1
    • /
    • pp.93-103
    • /
    • 2010
  • Game industry has been grown up with invigorating online game since 1990s and increased in the spending side by focusing on commercial business. Game, however, has displayed a role that can producing cyber culture with Machinima that is a kind of movie made by a game engine. It should be discussed that non-linear activity of game is disclosed being closely connected with the environment of online space. Accordingly, there are three ways for this study. The first, characteristics of game environment taken non-linear effect related to hypertext space is researched. The second, Machinima is discussed as expansion of hypertext literature. And the last, considering game as consumption of just playing, game is able to become the subject role that can create virtual culture with making a variety of Machinima. consequently, through dichotomy and a liner form between consumption and production in hypertext space leaded to non-liner situation, positive and active features of game that has taken the activity of virtual reality into the multimedia literature would be delivered.