• Title/Summary/Keyword: 비정형데이터분석

Search Result 405, Processing Time 0.03 seconds

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

  • Kim, JungJin;Jeong, Hanseok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.236-236
    • /
    • 2021
  • 최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.

  • PDF

Big Data Platform for Public Library Users: Focusing on the Cultural Programs and Community Service (이용자를 위한 공공도서관 빅데이터 플랫폼 구축 방안 연구 - 문화프로그램 및 커뮤니티 서비스 정보를 중심으로 -)

  • Yoon, SoYoung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.33 no.3
    • /
    • pp.347-370
    • /
    • 2022
  • Most public library websites provide unstructured cultural program data, which cannot be produced and utilized systematically as bibliographic information. It is not sufficiently used in existing library big data research or cases, and there is a risk of disappearing when the website is reorganized or the person in charge is changed. This study developed a data schema that can be used in conjunction with bibliographic data by collecting and analyzing cultural programs and community service data produced in an unstructured manner and proposed to share and utilize public library cultural programs and community service data, and establish a library big data platform that can serve as an information channel between librarians who are cultural program planners. Library program data posted on the library website can be integrated and managed through the platform, securing continuity of work, and systematically managing and preserving the specialized service history of individual libraries.

Methodology of Local Government Policy Issues Through Big Data Analysis (빅데이터 분석을 통한 지방자치단체 정책이슈 도출 방법론)

  • Kim, Yong-Jin;Kim, Do-Young
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.10
    • /
    • pp.229-235
    • /
    • 2018
  • The purpose of this study is to propose a method to utilize Big Data Analysis to find policy issues of local governments in the reality that utilization of big data becomes increasingly important in efficient and effective policy making process. For this purpose, this study analyzed the 180,000 articles of Suwon city for the past three years and identified policy issues and evaluated policy priorities through IPA analysis. The results of this study showed that the analysis of semi-formal big data through newspaper articles is effective in deriving the differentiated policy issues of different local autonomous bodies from the main issues in the nation, In this way, the methodology of finding policy issues through the analysis of big data suggested in this study means that local governments can effectively identify policy issues and effectively identify the people. In addition, the methodology proposed in this study is expected to be applicable to the policy issues through the analysis of various semi - formal and informal big data such as online civil complaint data of the local government, resident SNS.

A Meta Analysis of the Edible Insects (식용곤충 연구 메타 분석)

  • Yu, Ok-Kyeong;Jin, Chan-Yong;Nam, Soo-Tai;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.182-183
    • /
    • 2018
  • Big data analysis is the process of discovering a meaningful correlation, pattern, and trends in large data set stored in existing data warehouse management tools and creating new values. In addition, by extracts new value from structured and unstructured data set in big volume means a technology to analyze the results. Most of the methods of Big data analysis technology are data mining, machine learning, natural language processing, pattern recognition, etc. used in existing statistical computer science. Global research institutes have identified Big data as the most notable new technology since 2011.

  • PDF

Extraction of Highlight Scenes in Soccer Videos Using Statisical Threshold (통계적 임계값을 이용한 축구경기의 하이라이트 장면 검출)

  • 한지석;박기태;이종설;이석필;문영식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10b
    • /
    • pp.607-609
    • /
    • 2003
  • 동영상 자동 분석은 비디오 데이터의 내용 기반 색인과 검색을 위한 첫 단계이다. 본 논문에서는 정형적인 구조를 가진 뉴스와는 달리 비정형적인 특성을 가진 축구 동영상에서 사용자의 관심이 되는 하이라이트 장면의 영상 특징을 이용하여 그 구조를 분석하여 하이라이트 장면을 검출하는 방법을 제안한다. 이전 연구를 토대로 그라운드영역과 골대 유무에 따라 하이라이트 후보 장면을 찾는 과정에서 경기마다 달라지는 임계값에 영향을 받지 않는 알고리즘을 제안하였다. 실험결과 제안된 방법이 여러 종류의 축구 경기 하이라이트 분석에 있어서 그 성능이 우수함을 확인할 수 있었다.

  • PDF

Telemetering Service in OpenStack (오픈스택 텔레메터링 서비스(Ceilometer))

  • Baek, D.M.;Lee, B.C.
    • Electronics and Telecommunications Trends
    • /
    • v.29 no.6
    • /
    • pp.102-112
    • /
    • 2014
  • 최근 빌링(billing, 과금), 벤치마킹, 확장성(scalability), 통계적 목적을 위해 오픈스택 클라우드의 개별 컴포넌트를 모니터링하고 메터링하는 텔레메터링 서비스가 Ceilometer라는 코드명으로 정식 프로젝트로 추가되었다. 초기의 빌링만을 위해 필수 요소만 모니터링하는 것에서, 상태를 감시하여 클라우드 자원의 오토스케일링 등의 오케스트레이션 기능을 위한 다목적성으로 발전하고 있다. 특히 이것은 빅데이터 등의 데이터 분석에 있어서 중요한 힌트를 제공해 준다. 본고는 소스분석을 통한 Ceilometer의 데이터 수집 구조, Ceilometer 모니터링의 핵심 키워드, 비정형 데이터 DB인 MongoDB, 외부인터페이스로써 API(Application Interface) 혹은 CLI(Command Line Interface) 명령어를 소개하고자 한다. 결론에서는 ceilometer의 전반적 구조에 대한 나름대로의 평가를 기술하였다.

  • PDF

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Survey-based unstructured data analysis to predict flipped learning performance (플립드러닝 성과를 예측하기 위한 설문조사 기반의 비정형 데이터 분석)

  • Chayoung Kim;Yoon Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.519-524
    • /
    • 2023
  • The study summarizes the experience of operation in the application of flipped learning to various IT-related liberal arts subjects, and proposes a specific application method. So far, most of the studies have analyzed various strategies and learner responses to flipped learning. Currently, it is the time when teachers, who are the main operators of the flipped learning class, need to study how to provide immediate feedback and application while running the relevant courses. Studies related to this are gradually coming out. In general, most of the studies on sharing reference materials through the results after applying various strategies such as developing the structure of class operation by instructors themselves, combining them with discussion classes, or developing various contents. This study proposes a method to analyze how various strategies can be applied in the subject and obtain results simultaneously with class operation by analyzing unstructured data, which is a survey that can receive immediate feedback.

Relationship between Big Data and Analysis Prediction (빅데이터와 분석예측의 관계)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.167-168
    • /
    • 2017
  • In this paper, we discuss the importance of what to analyze and what to predict using Big Data. The issue of how and where to apply a large amount of data that is accumulated in my daily life and which I am not aware of is a very important factor. There are many kinds of tasks that specify what to predict and how to use these data. Finding the most appropriate one is the way to increase the prediction probability. In addition, the data that are analyzed and predicted should be useful in real life to make meaningful data.

  • PDF