• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.03 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Information types and characteristics within the Wireless Emergency Alert in COVID-19: Focusing on Wireless Emergency Alerts in Seoul (코로나 19 하에서 재난문자 내의 정보유형 및 특성: 서울특별시 재난문자를 중심으로)

  • Yoon, Sungwook;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.45-68
    • /
    • 2022
  • The central and local governments of the Republic of Korea provided information necessary for disaster response through wireless emergency alerts (WEAs) in order to overcome the pandemic situation in which COVID-19 rapidly spreads. Among all channels for delivering disaster information, wireless emergency alert is the most efficient, and since it adopts the CBS(Cell Broadcast Service) method that broadcasts directly to the mobile phone, it has the advantage of being able to easily access disaster information through the mobile phone without the effort of searching. In this study, the characteristics of wireless emergency alerts sent to Seoul during the past year and one month (January 2020 to January 2021) were derived through various text mining methodologies, and various types of information contained in wireless emergency alerts were analyzed. In addition, it was confirmed through the population mobility by age in the districts of Seoul that what kind of influence it had on the movement behavior of people. After going through the process of classifying key words and information included in each character, text analysis was performed so that individual sent characters can be used as an analysis unit by applying a document cluster analysis technique based on the included words. The number of WEAs sent to the Seoul has grown dramatically since the spread of Covid-19. In January 2020, only 10 WEAs were sent to the Seoul, but the number of the WEAs increased 5 times in March, and 7.7 times over the previous months. Since the basic, regional local government were authorized to send wireless emergency alerts independently, the sending behavior of related to wireless emergency alerts are different for each local government. Although most of the basic local governments increased the transmission of WEAs as the number of confirmed cases of Covid-19 increases, the trend of the increase in WEAs according to the increase in the number of confirmed cases of Covid-19 was different by region. By using structured econometric model, the effect of disaster information included in wireless emergency alerts on population mobility was measured by dividing it into baseline effect and accumulating effect. Six types of disaster information, including date, order, online URL, symptom, location, normative guidance, were identified in WEAs and analyzed through econometric modelling. It was confirmed that the types of information that significantly change population mobility by age are different. Population mobility of people in their 60s and 70s decreased when wireless emergency alerts included information related to date and order. As date and order information is appeared in WEAs when they intend to give information about Covid-19 confirmed cases, these results show that the population mobility of higher ages decreased as they reacted to the messages reporting of confirmed cases of Covid-19. Online information (URL) decreased the population mobility of in their 20s, and information related to symptoms reduced the population mobility of people in their 30s. On the other hand, it was confirmed that normative words that including the meaning of encouraging compliance with quarantine policies did not cause significant changes in the population mobility of all ages. This means that only meaningful information which is useful for disaster response should be included in the wireless emergency alerts. Repeated sending of wireless emergency alerts reduces the magnitude of the impact of disaster information on population mobility. It proves indirectly that under the prolonged pandemic, people started to feel tired of getting repetitive WEAs with similar content and started to react less. In order to effectively use WEAs for quarantine and overcoming disaster situations, it is necessary to reduce the fatigue of the people who receive WEA by sending them only in necessary situations, and to raise awareness of WEAs.

A Study on Trend Analysis in Convergence Research Applying Word Cloud in Korea (워드 클라우드 기법을 이용한 국내 융복합 학술연구 트렌드 분석)

  • Kim, Joon-Hwan;Mun, Hyung-Jin;Lee, Hang
    • Journal of Digital Convergence
    • /
    • v.19 no.2
    • /
    • pp.33-38
    • /
    • 2021
  • The convergence trend is the core of the 4th industrial revolution, and due to such expectations and possibilities, various countermeasures are being sought in diverse fields. This study conducted a quantitative analysis to identify the trend of convergence research over the past 10 years. Specifically, major research keywords were extracted, word cloud techniques were applied, and visualized to identify trends in academic research on convergence. To this end, research papers from 2012 to 2020 published in journal of digital convergence were investigated. The analysis period was divided into two periods: the former 4 years(2012-2015) and the latter 4 years(2016-2019) to confirm the difference in research trends. In addition, the research papers of 2020 were analyzed in order to more clearly understand the changes in the research trend of the last year due to the COVID-19. The results of this study are significant in that they can be used as useful basic data for future research and to understand research trends as keywords in the field of convergence.

Trend Analysis of Dance Performance Research Using Keywords and Topic Modeling of LDA Techniques (LDA 토픽 모델링 기법을 활용한 무용공연의 연구 동향 분석)

  • SI YU
    • Journal of Industrial Convergence
    • /
    • v.22 no.3
    • /
    • pp.13-25
    • /
    • 2024
  • This study explores research topics related to dance performances published in Korea based on big data and examines research trends that change according to the trend of the times. The results derived from topic modeling analysis are as follows. (1) Six major topics were derived: a study on marketing strategies and development plans for dance performances, (2) a study on the re-watching factors of dance performance space and performance satisfaction, (3) a study on the popularity and contribution of dance performances in the stage environment, (4) a study on the current status of dance performances and the convergence of dance group operations, (5) a study on the definition of dance performances using various social media, and (6) a study on the direction and development of technology-applied dance performance contents. Accordingly, research trends and topics related to dance, including dance performances, social changes, key keywords of researchers' change interests were extracted, and keywords were compared and analyzed to present academic changes and countermeasures. Accordingly, the need for research to apply new technologies was emphasized as it diversified and fused.

Analysis of Research Trends on Mountain Streams in the Republic of Korea: Comparison to International Research Trends (산지하천을 대상으로 한 국내 연구동향 분석: 국제 연구동향과의 비교)

  • Lee, Sang In;Seo, Jung Il;Lee, Yohan;Kim, Suk Woo;Chun, Kun Woo
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.2
    • /
    • pp.216-227
    • /
    • 2019
  • The purpose of this study is to propose the rational mountain stream management strategy considering the natural conditions and social needs of the Republic of Korea. We reviewed domestic and overseas studies related to mountain streams, identified the study areas by text mining and co-word analysis using the VOSviewer program, and then analyzed the spatial and temporal study trends and topics of each study area. The results showed that domestic studies on mountain streams are still in an initial stage compared to overseas studies. Overseas studies on mountain streams can be classified into four groups: (i) habitat and species composition of fish and invertebrates, (ii) hydrological phenomena and nutrient migration, (iii) transport of sediment and organic materials and the relevant morphological changes by runoff flows, and (iv) plant species composition in mountain streams. Of these study subjects, domestic studies belonging to the (i) group mainly focused on macroinvertebrates while domestic studies belonging to the (iii) group regarded transport of sediment and organic materials as not the ecological disturbance but the source of sediment-related disasters. We then analyzed the rate of each research group to all papers by period and country. The results showed that the overseas studies belonging to (iii) and (iv) groups have increased with time, and the increase was mostly due to the studies in the United States, Brazil, Canada, and China. On the other hand, domestic studies belonging to (i) and (iii) groups increased somewhat with time, but there was a slight lack of correlation between the two subjects. Therefore, the hybridity studies to complement the shortage is necessary for the future.

Trend Analysis of Convergence Research based on Social Big Data (소셜 빅데이터 기반 융합연구 동향 분석)

  • Noh, Younghee;Kim, Taeyoun;Jeong, Dae-Keun;Lee, Kwang Hee
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.2
    • /
    • pp.135-146
    • /
    • 2019
  • This study was designed to analyze trends in the entire convergence research beyond academic research through social media big data analysis at a time when interdisciplinary convergence research is emphasized along with the fourth industrial revolution. For this purpose, about 150,000 cases of texts and titles were acquired for about 10 years from January 2009 to September 2018 in connection with the convergence research in social media, and word cloud and network analysis were conducted. As a results, the research fields that were actively conducted for each period were eco-tech in 2009 and 2010, smart technology in 2011 and 2012, information and communication in 2013 and 2014, robots in 2015 and 2016, and artificial intelligence in 2017 and 2018. Also, the research areas that have been consistently conducted for about 10 years are culture, design, chemistry, nanotechnology, biotechnology, robot, IT, and information and communication. Since this study identifies trends in convergence research over time, it can be helpful to researchers who are planning convergence research direction by understanding the trends of convergence research.

Analysis of R&D Performance Management Plans of a Government-funded Research Institute in the Science and Technology Field: The Case of Korea Institute of Science and Technology Information (과학기술분야 정부출연연구기관 연구성과계획 분석: 한국과학기술정보연구원을 중심으로)

  • Jeong, Yong-il;Chung, Do-Bum;Yoon, Byung Sung
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.488-499
    • /
    • 2022
  • This study analyze the relationship between S&T policy and the R&D performance plans of GRIs which lack relevant research through quantitative information analysis. KISTI which is focused on the case is an ICT-based GRI that is sensitive to changes in the internal and external environment, and the impact of government S&T policy changes on KISTI's R&D performance plans was analyzed in depth.

A Study on the Design and the Construction of a Korean Speech DB for Common Use (공동이용을 위한 음성DB의 설계 및 구축에 관한 연구)

  • Kim, Bong-Wan;Kim, Jong-Jin;Kim, Sun-Tae;Lee, Yong-Ju
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.35-41
    • /
    • 1997
  • Speech database is an indispensable part of speech research. Speech database is necessary to use in speech research and development processes, and to evaluate performances of various speech-processing systems. To use speech database for common purpose, it is necessary to design utterance list that has all the possible phonetical events in minimal number of words, and is independent of tasks. To meet those restrictions this paper extracts PBW set from large text corpus. Speech database that was constructed using PBW set for utterance list and its properties are described in this paper.

  • PDF

Design and Implementation of the Java Applet-based Courseware (Java Applet 기반 코스웨어의 설계 및 구현)

  • Kim, Kyu-Soo;Kim, Hyun-Bae
    • Journal of The Korean Association of Information Education
    • /
    • v.4 no.2
    • /
    • pp.179-186
    • /
    • 2001
  • The purpose of this study is to design and implement a courseware that makes possible interaction between man and computer in the internet. For this, We select the contents of learning and designe a courseware with text, graphic data. HTML, Java script and Java applet. Some advantages of the courseware are as follows. Interactions between man and computer are possible by giving diverse feedback to input-response in the web. And it is possible to access the courseware regardless of time and space when the network environment of user's computer is suitably equipped. Finally, on operator's part, the revision of the courseware becomes easier and on client's part, the system resources are less required.

  • PDF

Design of LSTM-based Model for Extracting Relative Temporal Relations for Korean Texts (한국어 상대시간관계 추출을 위한 LSTM 기반 모델 설계)

  • Lim, Chae-Gyun;Jeong, Young-Seob;Lee, Young Jun;Oh, Kyo-Joong;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.301-304
    • /
    • 2017
  • 시간정보추출 연구는 자연어 문장으로부터 대화의 문맥과 상황을 파악하고 사용자의 의도에 적합한 서비스를 제공하는데 중요한 역할을 하지만, 한국어의 고유한 언어적 특성으로 인해 한국어 텍스트에서는 개체간의 시간관계를 정확하게 인식하기 어려운 경향이 있다. 특히, 시간표현이나 사건에 대한 상대적인 시간관계는 시간 문맥을 체계적으로 파악하기 위해 중요한 개념이다. 본 논문에서는 한국어 자연어 문장에서 상대적인 시간표현과 사건 간의 관계를 추출하기 위한 LSTM(long short-term memory) 기반의 상대시간관계 추출 모델을 제안한다. 시간정보추출 연구에는 TIMEX3, EVENT, TLINK 추출의 세 가지 과정이 포함되지만, 본 논문에서는 특정 문장에 대해서 이미 추출된 TIMEX3 및 EVENT 개체를 제공하고 상대시간관계 TLINK를 추출하는 것만을 목표로 한다. 또한, 사람이 직접 태깅한 한국어 시간정보 주석 말뭉치를 대상으로 LSTM 기반 제안모델들의 상대적 시간관계 추출 성능을 비교한다.

  • PDF