• 제목/요약/키워드: BIG4

Search Result 3,612, Processing Time 0.03 seconds

Korean End-to-End Coreference Resolution with BERT for Long Document (긴 문서를 위한 BERT 기반의 End-to-End 한국어 상호참조해결)

  • Jo, Kyeongbin;Jung, Youngjun;Lee, Changki;Ryu, Jihee;Lim, Joonho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.259-263
    • /
    • 2021
  • 상호참조해결은 주어진 문서에서 상호참조해결 대상이 되는 멘션(mention)을 식별하고, 동일한 개체(entity)를 의미하는 멘션들을 찾아 그룹화하는 자연어처리 태스크이다. 최근 상호참조해결에서는 BERT를 이용하여 단어의 문맥 표현을 얻은 후, 멘션 탐지와 상호참조해결을 동시에 진행하는 end-to-end 모델이 주로 연구되었으나, 512 토큰 이상의 긴 문서를 처리하기 위해서는 512 토큰 이하로 문서를 분할하여 처리하기 때문에 길이가 긴 문서에 대해서는 상호참조해결 성능이 낮아지는 문제가 있다. 본 논문에서는 512 토큰 이상의 긴 문서를 위한 BERT 기반의 end-to-end 상호참조해결 모델을 제안한다. 본 모델은 긴 문서를 512 이하의 토큰으로 쪼개어 기존의 BERT에서 단어의 1차 문맥 표현을 얻은 후, 이들을 다시 연결하여 긴 문서의 Global Positional Encoding 또는 Embedding 값을 더한 후 Global BERT layer를 거쳐 단어의 최종 문맥 표현을 얻은 후, end-to-end 상호참조해결 모델을 적용한다. 실험 결과, 본 논문에서 제안한 모델이 기존 모델과 유사한 성능을 보이면서(테스트 셋에서 0.16% 성능 향상), GPU 메모리 사용량은 1.4배 감소하고 속도는 2.1배 향상되었다.

  • PDF

Generating Label Word Set based on Maximal Marginal Relevance for Few-shot Name Entity Recognition (퓨샷 개체명 인식을 위한 Maximal Marginal Relevance 기반의 라벨 단어 집합 생성)

  • HyoRim Choi;Hyunsun Hwang;Changki Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.664-671
    • /
    • 2023
  • 최근 다양한 거대 언어모델(Large Language Model)들이 개발되면서 프롬프트 엔지니어링의 대한 다양한 연구가 진행되고 있다. 본 논문에서는 퓨삿 학습 환경에서 개체명 인식의 성능을 높이기 위해서 제안된 템플릿이 필요 없는 프롬프트 튜닝(Template-free Prompt Tuning) 방법을 이용하고, 이 방법에서 사용된 라벨 단어 집합 생성 방법에 Maximal Marginal Relevance 알고리즘을 적용하여 해당 개체명에 대해 보다 다양하고 구체적인 라벨 단어 집합을 생성하도록 개선하였다. 실험 결과, 'LOC' 타입을 제외한 나머지 개체명 타입에서 'PER' 타입은 0.60%p, 'ORG' 타입은 4.98%p, 'MISC' 타입은 1.38%p 성능이 향상되었고, 전체 개체명 인식 성능은 1.26%p 향상되었다. 이를 통해 본 논문에서 제안한 라벨 단어 집합 생성 기법이 개체명 인식 성능 향상에 도움이 됨을 보였다.

  • PDF

Trends Analysis on Research Articles of the Sharing Economy through a Meta Study Based on Big Data Analytics (빅데이터 분석 기반의 메타스터디를 통해 본 공유경제에 대한 학술연구 동향 분석)

  • Kim, Ki-youn
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.97-107
    • /
    • 2020
  • This study aims to conduct a comprehensive meta-study from the perspective of content analysis to explore trends in Korean academic research on the sharing economy by using the big data analytics. Comprehensive meta-analysis methodology can examine the entire set of research results historically and wholly to illuminate the tendency or properties of the overall research trend. Academic research related to the sharing economy first appeared in the year in which Professor Lawrence Lessig introduced the concept of the sharing economy to the world in 2008, but research began in earnest in 2013. In particular, between 2006 and 2008, research improved dramatically. In order to grasp the overall flow of domestic academic research of trends, 8 years of papers from 2013 to the present have been selected as target analysis papers, focusing on titles, keywords, and abstracts using database of electronic journals. Big data analysis was performed in the order of cleaning, analysis, and visualization of the collected data to derive research trends and insights by year and type of literature. We used Python3.7 and Textom analysis tools for data preprocessing, text mining, and metrics frequency analysis for key word extraction, and N-gram chart, centrality and social network analysis and CONCOR clustering visualization based on UCINET6/NetDraw, Textom program, the keywords clustered into 8 groups were used to derive the typologies of each research trend. The outcomes of this study will provide useful theoretical insights and guideline to future studies.

Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique (LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석)

  • Park, Ju-Seop;Lee, Sae-Mi
    • Informatization Policy
    • /
    • v.27 no.2
    • /
    • pp.66-83
    • /
    • 2020
  • Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

An Analysis of News Report Characteristics on Archives & Records Management for the Press in Korea: Based on 1999~2018 News Big Data (뉴스 빅데이터를 이용한 우리나라 언론의 기록관리 분야 보도 특성 분석: 1999~2018 뉴스를 중심으로)

  • Han, Seunghee
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.41-75
    • /
    • 2018
  • The purpose of this study is to analyze the characteristics of Korean media on the topic of archives & records management based on time-series analysis. In this study, from January, 1999 to June, 2018, 4,680 news articles on archives & records management topics were extracted from BigKinds. In order to examine the characteristics of the media coverage on the archives & records management topic, this study was analyzed to the difference of the press coverage by period, subject, and type of the media. In addition, this study was conducted word-frequency based content analysis and semantic network analysis to investigate the content characteristics of media on the subject. Based on these results, this study was analyzed to the differences of media coverage by period, subject, and type of media. As a result, the news in the field of records management showed that there was a difference in the amount of news coverage and news contents by period, subject, and type of media. The amount of news coverage began to increase after the Presidential Records Management Act was enacted in 2007, and the largest amount of news was reported in 2013. Daily newspapers and financial newspapers reported the largest amount of news. As a result of analyzing news reports, during the first 10 years after 1999, news topics were formed around the issues arising from the application and diffusion process of the concept of archives & records management. However, since the enactment of the Presidential Records Management Act, archives & records management has become a major factor in political and social issues, and a large amount of political and social news has been reported.

COVID-19 News Analysis Using News Big Data : Focusing on Topic Modeling Analysis (뉴스 빅데이터를 활용한 코로나19 언론보도 분석 :토픽모델링 분석을 중심으로)

  • Kim, Tae-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.457-466
    • /
    • 2020
  • The purpose of this study is to find out what the main agenda of social formation is and how it changes through the media by utilizing the news big data of COVID-19 which is spreading recently, and to suggest the direction of future reporting. In order to achieve the purpose of the research, 47,816 cases of news big data reported from December 31, 2019 to March 11, 2020 were divided into four periods based on the fourth stage of the crisis warning for infectious diseases, and a total of 20 topics were derived. Based on the results of the Topic Modeling analysis, this study proposed the following. First, it is necessary to refrain from provocative expressions such as "anxiety" and "fear" and use neutral and objective reporting terms. Second, more in-depth and contextual news production is required, breaking away from simple event news production. Third, it is necessary to prepare detailed crisis communication manuals for each situation related to infectious diseases. Fourth, we need reports that focus on citizens-led efforts to overcome the crisis. This research has the academic significance that it is the first paper to analyze news big data on COVID-19 using the Topic Modeling Analysis method, and the policy significance that can be used as the basis for developing national crisis communication policy.

The Analysis of Public Awareness about Literary Therapy by Utilizing Big Data Analysis - The aspects of convergence literature and statistics (빅데이터 분석을 통한 문학치료의 대중적 인지도 분석 - 국문학과 통계학의 융합적 측면)

  • Choi, Kyoung-Ho;Park, Jeong-Hye
    • Journal of Digital Convergence
    • /
    • v.13 no.4
    • /
    • pp.395-404
    • /
    • 2015
  • This study is exploring objective awareness of literary therapy by consideration of popular perception about literary therapy through analysis of big data. The purpose of this study is the deduction of meaning information through analysis in the viewpoint of big data at online social network service(SNS) about 'literary therapy'. Accordingly, the main way of research became content analysis of keyword linked to literary therapy by utilizing opinion mining method related to text mining. The study mainly grasped 'literary therapy' and analyzed 'bibliotherapy' comparatively. The period of study was from Oct. 10th to Nov. 10th, 2014(during 30 days), and SNS such as blog or twitter became the subject of search. Through the result of study analysis, the conclusion that the spread of literary therapeutic prospect, structural harmony of literary therapeutic field, and the solidity of perceptional axis about literary therapy are needed can be drawn. This study is worthwhile because it can investigate popular awareness about literary therapy and can suggest alternative for invigoration of literary therapy.

MORPHOLOGICAL VIEW ON BIG INDIVIDUALS APPEARED IN THE SAME AGE GROUP OF ZOEA LARVA, MACROBRACHIUM JAPONICUM (DE HAAN) (담수산 새우 Macrobrachium japonicum (De Haan)의 Zoea 유생기에 출현하는 동일영기군 속의 개체변이체에 대한 형태학적 고찰)

  • KWON Chin Soo
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.7 no.3
    • /
    • pp.126-144
    • /
    • 1974
  • Adult female prawn, Macrobrachium japonicum, which used for this work were collected at the river Simanto, Shikoku in Japan, and removed in the laboratory. Newly hatched larvae from an adult reared in circulation-filtered aquaria, which is controlled under the conditions of water temperature $26\pm0.3^{\circ}C$, chlorinity $6.21-6.45\%$ Cl, pH 8.0-8.1, illumination 3000 lux, and fed on Artemia salina nauplius sufficielltly For a period of rearing of zoea larvae, big individuals-individual varying bodies, comparing with standard bodies in the same aged individual group, are occasionally appeared from fifth zoea larva stage, and occurence of it be tempted to relate with the factor of trophic condition. This paper was dealt with a comparative morphological view on big individuals, comparing with standard type in the same aged group, to solve the problem on (1). the existence or nonexistence of stages which it is easier to occurence of big individuals, (2). the rate of development in several appendages of an individual, and (3). a happening of skipping whether it certainly occur or not, during newly hatched zoea larvae develop to post larva stage. The results of the above are as follow: (1). the stages which is easier to occur of big individuals are fifth and seventh stage in this species, (2). even same individual, development of several appendages differs more or less on the rate of growth in accordance with its parts, (3). Evidence which skipping phenomenon is occured, during development through zoea larvae to post larvae, couldn't confirm.

  • PDF

Strategies for the Development of Watermelon Industry Using Unstructured Big Data Analysis

  • LEE, Seung-In;SON, Chansoo;SHIM, Joonyong;LEE, Hyerim;LEE, Hye-Jin;CHO, Yongbeen
    • The Journal of Industrial Distribution & Business
    • /
    • v.12 no.1
    • /
    • pp.47-62
    • /
    • 2021
  • Purpose: Our purpose in this study was to examine the strategies for the development of watermelon industry using unstructured big data analysis. That is, this study was to look the change of issues and consumer's perception about watermelon using big data and social network analysis and to investigate ways to strengthen the competitiveness of watermelon industry based on that. Methodology: For this purpose, the data was collected from Naver (blog, news) and Daum (blog, news) by TEXTOM 4.5 and the analysis period was set from 2015 to 2016 and from 2017-2018 and from 2019-2020 in order to understand change of issues and consumer's perception about watermelon or watermelon industry. For the data analysis, TEXTOM 4.5 was used to conduct key word frequency analysis, word cloud analysis and extraction of metrics data. UCINET 6.0 and NetDraw function of UCINET 6.0 were utilized to find the connection structure of words and to visualize the network relations, and to make a cluster of words. Results: The keywords related to the watermelon extracted such as 'the stalk end of a watermelon', 'E-mart', 'Haman', 'Gochang', and 'Lotte Mart' (news: 015-2016), 'apple watermelon', 'Haman', 'E-mart', 'Gochang', and' Mudeungsan watermelon' (news: 2017-2018), 'E-mart', 'apple watermelon', 'household', 'chobok', and 'donation' (news: 2019-2020), 'watermelon salad', 'taste', 'the heat', 'baby', and 'effect' (blog: 2015-2016), 'taste', 'watermelon juice', 'method', 'watermelon salad', and 'baby' (blog: 2017-2018), 'taste', 'effect', 'watermelon juice', 'method', and 'apple watermelon' (blog: 2019-2020) and the results from frequency and TF-IDF analysis presented. And in CONCOR analysis, appeared as four types, respectively. Conclusions: Based on the results, the authors discussed the strategies and policies for boosting the watermelon industry and limitations of this study and future research directions. The results of this study will help prioritize strategies and policies for boosting the consumption of the watermelon and contribute to improving the competitiveness of watermelon industry in Korea. Also, it is expected that this study will be used as a very important basis for agricultural big data studies to be conducted in the future and this study will offer watermelon producers and policy-makers practical points helpful in crafting tailor-made marketing strategies.

Urban Vitality Assessment Using Spatial Big Data and Nighttime Light Satellite Image: A Case Study of Daegu (공간 빅데이터와 야간 위성영상을 활용한 도시 활력 평가: 대구시를 사례로)

  • JEONG, Si-Yun;JUN, Byong-Woon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.4
    • /
    • pp.217-233
    • /
    • 2020
  • This study evaluated the urban vitality of Daegu metropolitan city in 2018 using emerging geographic data such as spatial big data, Wi-Fi AP(access points) and nighttime light satellite image. The emerging geographic data were used in this research to quantify human activities in the city more directly at various spatial and temporal scales. Three spatial big data such as mobile phone data, credit card data and public transport smart card data were employed to reflect social, economic and mobility aspects of urban vitality while public Wi-Fi AP and nighttime light satellite image were included to consider virtual and physical aspects of the urban vitality. With PCA (Principal Component Analysis), five indicators were integrated and transformed to the urban vitality index at census output area by temporal slots. Results show that five clusters with high urban vitality were identified around downtown Daegu, Daegu bank intersection and Beomeo intersection, Seongseo, Dongdaegu station and Chilgok 3 district. Further, the results unveil that the urban vitality index was varied over the same urban space by temporal slots. This study provides the possibility for the integrated use of spatial big data, Wi-Fi AP and nighttime light satellite image as proxy for measuring urban vitality.