• Title/Summary/Keyword: 텍스트 연구

Search Result 3,471, Processing Time 0.057 seconds

Real-time Printed Text Detection System using Deep Learning Model (딥러닝 모델을 활용한 실시간 인쇄물 문자 탐지 시스템)

  • Ye-Jun Choi;Song-Won Kim;Mi-Kyeong Moon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.523-530
    • /
    • 2024
  • Online, such as web pages and digital documents, have the ability to search for specific words or specific phrases that users want to search in real time. Printed materials such as printed books and reference books often have difficulty finding specific words or specific phrases in real time. This paper describes the development of a deep learning model for detecting text and a real-time character detection system using OCR for recognizing text. This study proposes a method of detecting text using the EAST model, a method of recognizing the detected text using EasyOCR, and a method of expressing the recognized text as a bounding box by comparing a specific word or specific phrase that the user wants to search for. Through this system, users expect to find specific words or phrases they want to search in real time in print, such as books and reference books, and find necessary information easily and quickly.

Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications (텍스트 마이닝을 이용한 매체별 에볼라 주제 분석 - 바이오 분야 연구논문과 뉴스 텍스트 데이터를 이용하여 -)

  • An, Juyoung;Ahn, Kyubin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.289-307
    • /
    • 2016
  • Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

Research Trend Analysis on Living Lab Using Text Mining (텍스트 마이닝을 이용한 리빙랩 연구동향 분석)

  • Kim, SeongMook;Kim, YoungJun
    • Journal of Digital Convergence
    • /
    • v.18 no.8
    • /
    • pp.37-48
    • /
    • 2020
  • This study aimed at understanding trends of living lab studies and deriving implications for directions of the studies by utilizing text mining. The study included network analysis and topic modelling based on keywords and abstracts from total 166 thesis published between 2011 and November 2019. Centrality analysis showed that living lab studies had been conducted focusing on keywords like innovation, society, technology, development, user and so on. From the topic modelling, 5 topics such as "regional innovation and user support", "social policy program of government", "smart city platform building", "technology innovation model of company" and "participation in system transformation" were extracted. Since the foundation of KNoLL in 2017, the diversification of living lab study subjects has been made. Quantitative analysis using text mining provides useful results for development of living lab studies.

Analysis of Research Trends Using Text Mining (텍스트 마이닝을 활용한 연구 동향 분석)

  • Shim, Jaekwoun
    • Journal of Creative Information Culture
    • /
    • v.6 no.1
    • /
    • pp.23-30
    • /
    • 2020
  • This study used the text mining method to analyze the research trend of the Journal of Creative Information Culture(JCIC) which is the journal of convergence. The existing research trend analysis method has a limitation in that the researcher's personality is reflected using the traditional content analysis method. In order to complement the limitations of existing research trend analysis, this study used topic modeling. The English abstract of the paper was analyzed from 2015 to 2019 of the JCIC. As a result, the word that appeared most in the JCIC was "education," and eight research topics were drawn. The derived subjects were analyzed by educational subject, educational evaluation, learner's competence, software education and maker culture, information education and computer education, future education, creativity, teaching and learning methods. This study is meaningful in that it analyzes the research trend of the JCIC using text mining.

Conversion of linear, paper-based documents into Hypertext (선형문서를 하이퍼텍스트문서로 자동변환시키기 위한 연구 및 구현)

  • Kim, Jin-Soo;Park, Dong-won
    • The Journal of Natural Sciences
    • /
    • v.8 no.1
    • /
    • pp.101-107
    • /
    • 1995
  • The purpose of this work is to develop automatic techniques for converting linear, paper-based documents to a non-linear format suitable for use in hypertext systems. The selected document was partially converted to hypertext manually, and a prototype was created using the rules derived from the manual conversion process. The full conversion was divided into three passes: correcting the electronic linear form of the document, generating a listing of the links in the document, and creating the hypertext document. Passes 2 and 3 were entirely automatic. From this study, it may be concluded that many classes of paper-based documents can be automatically converted to hypertext.

  • PDF

A Quality Value Algorithm based on Text/Non-text Features in Q&A Documents (텍스트/비텍스트 특성기반 질의답변문서의 품질지수 알고리즘)

  • Kim, Deok-Ju;Park, Keon-Woo;Lee, Sang-Hun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.105-108
    • /
    • 2010
  • 쌍방향으로 질문과 답변을 하는 커뮤니티 기반의 지식검색서비스에서는 질의를 통해 원하는 답변을 얻을 수 있지만, 수많은 사용자들이 참여함에 따라 방대한 문서 속에서 검증된 문서를 찾아내는 것은 점점 더 어려워지고 있다. 지식검색서비스에서 기존 연구는 사용자들이 생성한 데이터 즉 추천수, 조회수 등의 비텍스트 정보를 이용하거나 답변의 길이, 자료첨부, 연결어 등의 텍스트 정보 이용하여 전문가를 식별하거나 문서의 품질을 평가하고, 이를 검색에 반영하여 검색성능을 향상시키는 데 활용했다. 그러나 비텍스트 정보는 질의/응답의 초기에 사용자들에 의해 충분한 정보를 확보할 수 없는 단점이 제기 되며, 텍스트 정보는 전체의 문서를 답변의 길이, 자료 첨부등과 같은 일부요인으로 판단해야하기 때문에 품질평가의 한계가 있다고 볼 수 있겠다. 본 논문에서는 이러한 비텍스트 정보와 텍스트 정보의 문제점을 개선하기 위한 품질평가 알고리즘을 제안한다. 제안된 알고리즘을 통한 품질지수는 텍스트/비텍스트 정보와 소셜 네트워크 사용자 중앙성을 고려하여 질문에 적합하고 신뢰성 있는 답변을 랭킹화 함으로써 지식검색문서를 분별하는 지표가 되며, 이는 지식검색서비스의 성능향상에 기여를 할 수 있을 것으로 기대된다.

  • PDF

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

A Machine Learning Based Facility Error Pattern Extraction Framework for Smart Manufacturing (스마트제조를 위한 머신러닝 기반의 설비 오류 발생 패턴 도출 프레임워크)

  • Yun, Joonseo;An, Hyeontae;Choi, Yerim
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.97-110
    • /
    • 2018
  • With the advent of the 4-th industrial revolution, manufacturing companies have increasing interests in the realization of smart manufacturing by utilizing their accumulated facilities data. However, most previous research dealt with the structured data such as sensor signals, and only a little focused on the unstructured data such as text, which actually comprises a large portion of the accumulated data. Therefore, we propose an association rule mining based facility error pattern extraction framework, where text data written by operators are analyzed. Specifically, phrases were extracted and utilized as a unit for text data analysis since a word, which normally used as a unit for text data analysis, is unable to deliver the technical meanings of facility errors. Performances of the proposed framework were evaluated by addressing a real-world case, and it is expected that the productivity of manufacturing companies will be enhanced by adopting the proposed framework.

Analysis of Educational Issues through Topic Modeling of National Petitions Text (국민청원글의 토픽 모델링을 통한 교육이슈 분석)

  • Shim, Jaekwoun
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.4
    • /
    • pp.633-640
    • /
    • 2021
  • Education related issues are social problems in which various groups and situations are intricately linked to each other. It is difficult to find issues by analyzing social phenomena related to education. Korean based text analysis can be analyzed in a quantitative. With the development of text analysis techniques, research results have been recently achieved, and it can be fully utilized to derive educational issues from text data in Korean. In this study, petition articles in the field of childcare/education were collected on the online-board of the Blue House National Petition website, and text analysis was used to derive issues in the education world. The analysis derived 6 topics through Latent Dirichlet Allocation(LDA) among topic modeling techniques. The association rules of major keywords were analyzed and visualized as graphs. In addition to deriving educational issues through the existing questionnaire, it can provide implications for future research directions and policies in that issues can be sufficiently discovered through text-based analysis methods.

Employee's Discontent Text Analysis on Anonymous Company Review Web and Suggestions for Discontent Resolve (기업 리뷰 웹 사이트 텍스트 분석을 통한 직원 불만 표현 추출과 불만 원인 도출 및 해소 방안)

  • Baek, HyeYeon;Park, Yongsuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.4
    • /
    • pp.357-364
    • /
    • 2019
  • As industrial information disclosure by insider's rate is around 80%, most of relevant researches explain briefly its causes are discontent of salary or human resources system. This paper scrapes texts on Jobplanet, an anonymous company review website and analyzes discontent keyword by 7 related area and their contexts to find out more details on brief causes referred above. After drawing LGG (Local Grammar Graph) by each areas with related dictionary list, this paper shows an example of concordance as a proof and several ways for human resources leakage prevention. Finally, text analysis results are compared with previous researches based on survey with limited questions and answers. This study is meaningful to expand the scope of employee discontent analysis with company review text and provide more specific, granular and honest discontent vocabularies.