• Title/Summary/Keyword: Document Frequency

Search Result 303, Processing Time 0.032 seconds

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

  • Kim, JungJin;Jeong, Hanseok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.236-236
    • /
    • 2021
  • 최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.

  • PDF

Convolutional Neural Network-based Malware Classification Method utilizing Local Feature-based Global Image (로컬 특징 기반 글로벌 이미지를 사용한 CNN 기반의 악성코드 분류 방법)

  • Jang, Sejun;Sung, Yunsick
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.222-223
    • /
    • 2020
  • 최근 악성코드로 인한 피해가 증가하고 있다. 악성코드는 악성코드가 속한 종류에 따라서 대응하는 방법도 다르기 때문에 악성코드를 종류별로 분류하는 연구도 중요하다. 기존에는 악성코드 시각화 과정을 통해서 생성된 악성코드의 글로벌 이미지를 사용해 악성코드를 각 종류별로 분류한다. 글로벌 이미지를 악성코드로부터 추출한 바이너리 정보를 사용해서 생성한다. 하지만, 글로벌 이미지만을 사용해서 악성코드를 각 종류별로 분류하는 경우 악성코드의 종류별로 중요한 특징을 고려하기 않기 때문에 분류 정확도가 떨어진다. 본 논문에서는 악성코드의 글로벌 이미지에 악성코드의 종류별 특징을 나타내기 위한 로컬 특징 기반 글로벌 이미지를 사용한 악성코드 분류 방법을 제안한다. 첫 번째, 악성 코드로부터 바이너리를 추출하고 추출된 바이너리를 사용해서 글로벌 이미지를 생성한다. 두 번째, 악성 코드로부터 로컬 특징을 추출하고 악성코드의 종류별 핵심 로컬 특징을 단어-역문서 빈도(Term Frequency Inverse Document Frequency, TFIDF) 알고리즘을 사용해 선택한다. 세 번째, 생성된 글로벌 이미지에 악성코드의 패밀리별 핵심 특징을 픽셀화해서 적용한다. 네 번째, 생성된 로컬 특징 기반 글로벌 이미지를 사용해서 컨볼루션 모델을 학습하고, 학습된 컨볼루션 모델을 사용해서 악성코드를 각 종류별로 분류한다.

A Study on Social Issues for Hydrogen Industry Using News Big Data (뉴스 빅데이터를 활용한 수소 이슈 탐색)

  • CHOI, ILYOUNG;KIM, HYEA-KYEONG
    • Journal of Hydrogen and New Energy
    • /
    • v.33 no.2
    • /
    • pp.121-129
    • /
    • 2022
  • With the advent of the post-2020 climate regime, the hydrogen industry is growing rapidly around the world. In order to build the hydrogen economy, it is important to identify social issues related to hydrogen and prepare countermeasures for them. Accordingly, this study conducted a semantic network analysis on hydrogen news from NAVER. As a result of the analysis, the number of hydrogen news in 2020 increased by 4.5 times compared to 2016, and as of 2018, the hydrogen issue has shifted from an environmental aspect to an economic aspect. In addition, although the initial government-led hydrogen industry is expanding to the mobility field such as privately-led fuel cell electric vehicles and hydrogen fuel, terms showing concerns about the safety such as explosions are constantly being exposed. Thus, it is necessary not only to expand the hydrogen ecosystem through the participation of private companies, but also to promote hydrogen safety.

Analysis of News Articles on Child Welfare Policies in South Korea: K-Means Clustering (대한민국 정권별 아동복지정책 관련 뉴스 기사 분석: K-평균 군집 분석)

  • Kim, Eun Joo;Kim, Seong Kwang;Park, Bit Na
    • Journal of East-West Nursing Research
    • /
    • v.29 no.2
    • /
    • pp.185-195
    • /
    • 2023
  • Purpose: The purpose of this study is to analyze changes of child welfare policies and provide insights based on the collection and classification of newspaper articles. Methods: Articles related to child welfare policies were collected from 1990, during the Kim, Young-sam administration, to May 9, 2022, under the Moon, Jae-in administration. K-Means clustering and keyword Term Frequency-Inverse Document Frequency analysis were utilized to cluster and analyze newspaper articles with similar themes. Results: The administrations of Kim, Young-sam, Kim, Dae-jung, Roh, Moo-hyun, and Park, Geun-hye were classified into two clusters, and the Lee, Myung-bak and Moon, Jae-in administrations were classified into three clusters. Conclusion: South Korea's child welfare policies have focused on ensuring the safety and healthy development of children through diverse policies initiatives over the years. However, challenges related to child protection and child abuse persist. This requires additional resources and budget allocation. It is important to establish a comprehensive support system for children and families, including comprehensive nursing support.

MB-OFDM UWB modem SoC design (MB-OFDM 방식 UWB 모뎀의 SoC칩 설계)

  • Kim, Do-Hoon;Lee, Hyeon-Seok;Cho, Jin-Woong;Seo, Kyeung-Hak
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.8C
    • /
    • pp.806-813
    • /
    • 2009
  • This paper presents a modem chip design for high-speed wireless communications. Among the high-speed communication technologies, we design the UWB (Ultra-Wideband) modem SoC (System-on-Chip) Chip based on a MB-OFDM scheme which uses wide frequency band and gives low frequency interference to other communication services. The baseband system of the modem SoC chip is designed according to the standard document published by WiMedia. The SoC chip consists of FFT/IFFT (Fast Fourier Transform/Inverse Fast Fourier Transform), transmitter, receiver, symbol synchronizer, frequency offset estimator, Viterbi decoder, and other receiving parts. The chip is designed using 90nm CMOS (Complementary Metal-Oxide-Semiconductor) procedure. The chip size is about 5mm x 5mm and was fab-out in July 20th, 2009.

A Comparative Study of the Impacts among Patent Assignees in Pharmaceutical Research based on Bibliometric Analyses (계량서지학적 분석을 통한 약물연구분야 특허출원인 간 영향력 비교)

  • Kim, Heeyoung;Park, Ji-Hong
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.1-15
    • /
    • 2022
  • This study analyzes the relationship of citations appearing in the patent data to understand knowledge transfers and impacts between patent documents in the field of pharmaceutical research. Patent data were collected from a website, Google Patents. The top 25 assignees were selected by searching for patent documents related to pharmaceutical research. We identify the citation relationships between assignees, then calculate and compare the values of h-index and derived indicators by using the number of citations and rank for each document of each assignee. As a result, in the case of pharmaceutical research, the assignee, such as 'Pfizer, MIT, and Abbott' shows a high impact. Among the five bibliometric indicators, the g-index and hS-index show similar results, and the indicators are the most related to the rankings of Total Citation Frequency, Cites per Patents, and Maximum Citation Frequency. In addition, it is highly related to the five indicators in the order of Total Citation Frequency, Cites per Patents, and Maximum Citation Frequency. In some cases, it is difficult to make an accurate comparison with Cites per Patents alone, which is previously known to indicate the technological influence of patent assignees.

A Study on the Archival Information Services of Economic Policy Using Text Mining Methods: Focusing on Economic Policy Directions (텍스트 마이닝을 활용한 경제정책기록서비스 연구: 경제정책방향을 중심으로)

  • Yeon, Jihyun;Kim, Sungwon
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.2
    • /
    • pp.117-133
    • /
    • 2022
  • The archival content listed arbitrarily makes it difficult for users to efficiently access the records of major economic policies, especially given that they use it without understanding the required period and context. Using the text mining techniques in the 30-year economic policy direction from 1991 to 2021, this paper derives economic-related keywords and changes that the government mainly dealt with. It collects and preprocesses major economic policies' background, main content, and body text and conducts text frequency, term frequency-inverse document frequency (TF-IDF), network, and time series analyses. Based on these analyses, the following words are recorded in order of frequency: "job(일자리)," "competitive(경쟁력)," and "restructuring(구조조정)." In addition, the relative ratio of "job (일자리)," "real estate(부동산)," and "corporation(기업)," by year was analyzed in terms of chronological order while presenting major keywords mentioned by each government. Based on the results, this study presents implications for developing and broadening the area of archival information services related to economic policies.

A Study on the Implication of Volume Contract Clause under Rotterdam Rules (로테르담 규칙상 수량계약조항의 시사점에 관한 연구)

  • Han, Nak-Hyun
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.49
    • /
    • pp.325-358
    • /
    • 2011
  • The purpose of this study aims to analyse the implications of volume contract clause with Rotterdam Rules. The Hague-Visby Rules have been in force this jurisdiction for over 30 years. In those three decades they have performed valiant service, both for the development of maritime law in this country and for the countless parties from around the world who have chosen courts and arbitral tribunals in London for the resolution of disputes arising under bills of lading or under charterparties incorporating the Hague-Visby Rules. While the Hague-Visby Rules apply only to bills of lading or any other similar documents of title and hence all other contracts of carriage are not subject to the current regime, this is not the case for the Rotterdam Rules which, broadly speaking, apply to contracts of carriage whether or not a shipping document or electronic transport record is issued. To preserve freedom of contract where necessary, however, a number of significant concessions were made and Article 80 represents one of the most controversial: that of volume contracts. However, the provision lends itself to abuse under each one of the elements as there is no minimum quantity, period of time or frequency and the minimum number of shipments is clearly just two. This means that important contracts of affreighment concluded pursuant to, for example, oil supply agreements have the same right to be excluded from the scope of application of the Rotterdam Rules. The fact that a volume contract may incorporate by reference the carrier's public schedule of services and the transport document or other similar documents as terms of the contract would make a carefully drafted booking note for consecutive shipments a potential volume contract as well.

  • PDF

Wrapper-based Economy Data Collection System Design And Implementation (래퍼 기반 경제 데이터 수집 시스템 설계 및 구현)

  • Piao, Zhegao;Gu, Yeong Hyeon;Yoo, Seong Joon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.227-230
    • /
    • 2015
  • For analyzing and prediction of economic trends, it is necessary to collect particular economic news and stock data. Typical Web crawler to analyze the page content, collects document and extracts URL automatically. On the other hand there are forms of crawler that can collect only document of a particular topic. In order to collect economic news on a particular Web site, we need to design a crawler which could directly analyze its structure and gather data from it. The wrapper-based web crawler design is required. In this paper, we design a crawler wrapper for Economic news analysis system based on big data and implemented to collect data. we collect the data which stock data, sales data from USA auto market since 2000 with wrapper-based crawler. USA and South Korea's economic news data are also collected by wrapper-based crawler. To determining the data update frequency on the site. And periodically updated. We remove duplicate data and build a structured data set for next analysis. Primary to remove the noise data, such as advertising and public relations, etc.

  • PDF

Automatic Generation of the Local Level Knowledge Structure of a Single Document Using Clustering Methods (클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구)

  • Han, Seung-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.251-267
    • /
    • 2004
  • The purpose of this study is to generate the local level knowledge structure of a single document, similar to end-of-the-book indexes and table of contents of printed material through the use of term clustering and cluster representative term selection. Furthermore, it aims to analyze the functionalities of the knowledge structure. and to confirm the applicability of these methods in user-friend1y information services. The results of the term clustering experiment showed that the performance of the Ward's method was superior to that of the fuzzy K -means clustering method. In the cluster representative term selection experiment, using the highest passage frequency term as the representative yielded the best performance. Finally, the result of user task-based functionality tests illustrate that the automatically generated knowledge structure in this study functions similarly to the local level knowledge structure presented In printed material.