• Title/Summary/Keyword: document topic

Search Result 190, Processing Time 0.033 seconds

An Automatically Extracting Formal Information from Unstructured Security Intelligence Report (비정형 Security Intelligence Report의 정형 정보 자동 추출)

  • Hur, Yuna;Lee, Chanhee;Kim, Gyeongmin;Jo, Jaechoon;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.17 no.11
    • /
    • pp.233-240
    • /
    • 2019
  • In order to predict and respond to cyber attacks, a number of security companies quickly identify the methods, types and characteristics of attack techniques and are publishing Security Intelligence Reports(SIRs) on them. However, the SIRs distributed by each company are huge and unstructured. In this paper, we propose a framework that uses five analytic techniques to formulate a report and extract key information in order to reduce the time required to extract information on large unstructured SIRs efficiently. Since the SIRs data do not have the correct answer label, we propose four analysis techniques, Keyword Extraction, Topic Modeling, Summarization, and Document Similarity, through Unsupervised Learning. Finally, has built the data to extract threat information from SIRs, analysis applies to the Named Entity Recognition (NER) technology to recognize the words belonging to the IP, Domain/URL, Hash, Malware and determine if the word belongs to which type We propose a framework that applies a total of five analysis techniques, including technology.

Study on Text Analysis of the Liquefied Natural Gas Carriers Dock Specification for Development of the Ship Predictive Maintenance Model (선박예지정비모델 개발을 위한 LNG 선박 도크 수리 항목의 텍스트 분석 연구)

  • Hwang, Taemin;Youn, Ik-Hyun;Oh, Jungmo
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.1
    • /
    • pp.60-66
    • /
    • 2021
  • The importance of maintenance is leading the application of the maintenance strategy in various industries. The maritime industry is also a part of them, with changes in selecting and applying the maintenance strategy, but rather slowly, by retaining the old strategy. In particular, the ship is maintaining a previously used strategy. In the circumstance of the sea, ship requires a new suggestion for maintenance strategy. A ship predictive maintenance model predicts the breakdown or malfunction of machineries to secure maintenance time with preventive actions and treatments, thereby avoiding maintenance-related dangerous factors. This study focused on applying text analysis to an Liquefied Natural Gas Carriers dock indent document, and the analysis results were interpreted from the original document. The inter-relational patterns observed from the frequency of common maintenance combinations among different parts and equipment in ships will be applied to the development of ship predictive maintenance.

Analysis of Potential Construction Risk Types in Formal Documents Using Text Mining (텍스트 마이닝을 통한 건설공사 공문 잠재적 리스크 유형 분석)

  • Eom, Sae Ho;Cha, Gichun;Park, Sun Kyu;Park, Seunghee;Park, Jongho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.1
    • /
    • pp.91-98
    • /
    • 2023
  • Since risks occurring in construction projects can have a significant impact on schedules and costs, there have been many studies on this topic. However, risk analysis is often limited to only certain construction situations,and experience-dependent decision-making is therefore mainly performed. Data-based analyses have only been partially applied to safety and contract documents. Therefore, in this study, cluster analysis and a Word2Vec algorithm were applied to formal documents that contain important elements for contractors or clients. An initial classification of document content into six types was performed through cluster analysis, and 157 occurrence types were subdivided through application of the Word2Vec algorithm. The derived terms were re-classified into five categories and reviewed as to whether the terms could develop into potential construction risk factors. Identifying potential construction risk factors will be helpful as basic data for process management in the construction industry.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Trend Analysis in Maker Movement Using Text Mining (텍스트 마이닝을 이용한 메이커 운동의 트렌드 분석)

  • Park, Chanhyuk;Kim, Ja-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.12
    • /
    • pp.468-488
    • /
    • 2018
  • The maker movement is a phenomenon of society and culture where people who make necessary things come together and share knowledge and experience through creativity. However, as the maker movement has grown rapidly over the past decade, there is still a lack of consensus for how far they will be viewed as a maker movement. We need to look at how the maker movement has changed so far in order to find the direction of development of the maker movement. This study analyzes the media articles using text-based big data analysis methodology to understand how the issue of the maker movement has changed in general media. In particular, we apply Keyword Network Analysis and DTM(Dynamic Topic Model) to analyze changes of interest according to time. The Keyword Network Analysis derives major keywords at the word level in order to analyze the evolution of the maker movement, and DTM helps to identify changes in interest in different areas of the maker movement at three levels: word, topic, and document. As a result, we identified major topics such as start-ups, makerspaces, and maker education, and the major keywords have changed from 3D printer and enterprise to education.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.

A Natural Language Question Answering System-an Application for e-learning

  • Gupta, Akash;Rajaraman, Prof. V.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.285-291
    • /
    • 2001
  • This paper describes a natural language question answering system that can be used by students in getting as solution to their queries. Unlike AI question answering system that focus on the generation of new answers, the present system retrieves existing ones from question-answer files. Unlike information retrieval approaches that rely on a purely lexical metric of similarity between query and document, it uses a semantic knowledge base (WordNet) to improve its ability to match question. Paper describes the design and the current implementation of the system as an intelligent tutoring system. Main drawback of the existing tutoring systems is that the computer poses a question to the students and guides them in reaching the solution to the problem. In the present approach, a student asks any question related to the topic and gets a suitable reply. Based on his query, he can either get a direct answer to his question or a set of questions (to a maximum of 3 or 4) which bear the greatest resemblance to the user input. We further analyze-application fields for such kind of a system and discuss the scope for future research in this area.

  • PDF

Cognitive Based Context Aware Reference History Management Tool

  • Punithan, Dharani;McKay, Bob
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.227-231
    • /
    • 2009
  • The aim of the research is to focus on the cognitive principles and to achieve human-level intelligence in referring context based browser history and the Windows history. One of the major problems faced by today's computer users is insufficient and single exclusive context based reference of the browser history and the Windows history. Today we search for the browser history and Windows history in different places even though the context is the same. For e.g., When working on a research paper or preparing a business presentation, a user may require to refer many web sites on the internet and various documents on the local computer. The browser can provide only time based history. The windows document history is also time based and limited to list only few documents. Hence, we propose a tool "Cognitive Based Context Aware Reference History Management Tool" which helps to access the exclusive reference of context and time based history in one place. The tool also proposes to store image history with urls and classifies images of a specific topic accessed in different time, bookmarks management and cross browser history management. These features are very useful as we can access all related documents (doc, docx, ppt, pptx, pdf, txt, and html), web pages, images and bookmarks in one place. The tool uses the cognitive principles like classification and association to achieve the purpose.

  • PDF

The Value and Limitations of Guidelines, Expert Consensus, and Registries on the Management of Patients with Thoracic Aortic Disease

  • Pacini, Davide;Murana, Giacomo;Leone, Alessandro;Marco, Luca Di;Pantaleo, Antonio
    • Journal of Chest Surgery
    • /
    • v.49 no.6
    • /
    • pp.413-420
    • /
    • 2016
  • Doctors are often faced with difficult decisions and uncertainty when patients need a certain treatment. They routinely rely on the scientific literature, in addition to their knowledge, experience, and patient preferences. Clinical practice guidelines are created with the intention of facilitating decision-making. They may offer concise instructions for the diagnosis, management (medical or surgical treatments), and prevention of specific diseases or conditions. All information included in the final version are the result of a systematic review of scientific articles and an assessment of the benefits and costs of alternative care options. The final document attempts to meet the needs of most patients in most circumstances and clinicians, aware of these recommendations, should always make individualized treatment decisions. In this review, we attempted to define the intent and applicability of clinical practice guidelines, expert consensus documents, and registry studies, focusing on the management of patients with thoracic aortic disease.