• 제목/요약/키워드: Text mining analysis

검색결과 1,187건 처리시간 0.025초

A Multilevel Project-Oriented Risk-Mining Framework for Overseas Construction Projects

  • Son, JeongWook;Lee, JeeHee;Yi, June-Seong
    • 국제학술발표논문집
    • /
    • The 6th International Conference on Construction Engineering and Project Management
    • /
    • pp.39-40
    • /
    • 2015
  • As international construction market increases, the importance of risk management in international construction project is emphasized. Unfortunately, current risk management practice does not sufficiently deal with project risks. Although a lot of risk analysis techniques have been introduced, most of them focus on project's external unexpected risks such as country conditions and owner's financial standing. However, because those external risks are difficult to manage and take preemptive action, we need to concentrate on project inherent risks. Based on this premise, this paper proposes a project-oriented risk mining approach which could detect and extract project risk factors automatically before they are materialized. This study presents a methodology regarding how to extract potential risks which exist in owner's project requirements and project tender documents using state of the art data analysis method such as text mining. The project-oriented risk mining approach is expected to effectively reflect project characteristics to the project risk management and could provide construction firms with valuable business intelligence.

  • PDF

철강산업 트렌드 분석을 위한 텍스트 마이닝 도입 연구 : P사(社) 사례를 중심으로 (A Pilot Study on Applying Text Mining Tools to Analyzing Steel Industry Trends : A Case Study of the Steel Industry for the Company "P")

  • 민기영;김훈태;지용구
    • 한국전자거래학회지
    • /
    • 제19권3호
    • /
    • pp.51-64
    • /
    • 2014
  • 기업은 생존을 위해 수많은 정보 속에서 빠르게 상황을 인식하고 미래를 예측하기 위해 정량데이터 분석뿐만 아니라 비정형데이터 분석에 대한 관심이 높아지고 있으나, 철강산업에서는 아직 활발하게 활용되지는 못하고 있다. 이에 본 연구에서는 글로벌 철강회사인 P사(社)의 사례를 중심으로 텍스트 마이닝을 이용한 산업트렌드 분석을 시도해 경쟁사 전략, 관심국가의 시장변화, 해외사업장 여론 등을 파악 하는데 기여할 수 있다는 가능성을 발견하였다. 사례 분석은 철강산업을 10개의 카테고리로 분류하고 각각 10개의 주제를 선정하여 분석을 시도하고, 이중 의미 있는 변화를 발견하면 심층 분석하는 형태로 진행하였다. 이번 P사(社)의 사례를 통해 텍스트 마이닝을 통한 산업트렌드 분석이 더 의미 있기 위해서는 목적을 명확히 하고, 관련 키워드를 체계화한다면 경쟁사 전략 파악, 리스크관리, 정량데이터 예측 보정 등 많은 부분에 기여할 수 있을 것으로 기대한다.

Rating and Comments Mining Using TF-IDF and SO-PMI for Improved Priority Ratings

  • Kim, Jinah;Moon, Nammee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권11호
    • /
    • pp.5321-5334
    • /
    • 2019
  • Data mining technology is frequently used in identifying the intention of users over a variety of information contexts. Since relevant terms are mainly hidden in text data, it is necessary to extract them. Quantification is required in order to interpret user preference in association with other structured data. This paper proposes rating and comments mining to identify user priority and obtain improved ratings. Structured data (location and rating) and unstructured data (comments) are collected and priority is derived by analyzing statistics and employing TF-IDF. In addition, the improved ratings are generated by applying priority categories based on materialized ratings through Sentiment-Oriented Point-wise Mutual Information (SO-PMI)-based emotion analysis. In this paper, an experiment was carried out by collecting ratings and comments on "place" and by applying them. We confirmed that the proposed mining method is 1.2 times better than the conventional methods that do not reflect priorities and that the performance is improved to almost 2 times when the number to be predicted is small.

Business Model Mining: Analyzing a Firm's Business Model with Text Mining of Annual Report

  • Lee, Jihwan;Hong, Yoo S.
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.432-441
    • /
    • 2014
  • As the business model is receiving considerable attention these days, the ability to collect business model related information has become essential requirement for a company. The annual report is one of the most important external documents which contain crucial information about the company's business model. By investigating business descriptions and their future strategies within the annual report, we can easily analyze a company's business model. However, given the sheer volume of the data, which is usually over a hundred pages, it is not practical to depend only on manual extraction. The purpose of this study is to complement the manual extraction process by using text mining techniques. In this study, the text mining technique is applied in business model concept extraction and business model evolution analysis. By concept, we mean the overview of a company's business model within a specific year, and, by evolution, we mean temporal changes in the business model concept over time. The efficiency and effectiveness of our methodology is illustrated by a case example of three companies in the US video rental industry.

광고성 메일을 자동으로 구별해내는 Text Mining 기법 연구 (Detecting spam mails using Text Mining Techniques)

  • 이종호
    • 한국인지과학회:학술대회논문집
    • /
    • 한국인지과학회 2002년도 춘계학술대회
    • /
    • pp.35-39
    • /
    • 2002
  • 광고성 메일이 개인 당 하루 평균 10통 내외로 오며, 그 제목만으로는 광고메일을 효율적으로 제거하기 어려운 현실이다. 이러한 어려움은 주로 광고 제목을 교묘히 인사말이나 답신처럼 변경하는 데에서 오는 것이며, 이처럼 제목으로 광고를 삭제할 수 없도록 은폐하는 노력은 계속될 추세이다. 그래서 제목을 통한 변화에 적응하면서, 제목뿐만 아니라 내용에 대한 의미 파악을 자동으로 수행하여 스팸 메일을 차단하는 방법이 필요하다. 본 연구에서는 정상 메일과 스팸 메일의 범주화(classification) 방식으로 접근하였다. 이러한 범주화 방식에 대한 기준을 자동으로 알기 위해서는 사람처럼 문장 해독을 통한 의미파악이 필요하지만, 기계가 문장 해독을 통해서 의미파악을 하는 비용이 막대하므로, 의미파악을 단어수준 등에서 효율적으로 대신하는 text mining과 web contents mining 기법들에 대한 적용 및 비교 연구를 수행하였다. 약 500 통에 달하는 광고메일을 표본으로 하였으며, 정상적인 편지군(500 통)에 대해서 동일한 기법을 적용시켜 false alarm도 측정하였다. 비교 연구 결과에 의하면, 메일 패턴의 가변성이 너무 커서 wrapper generation 방법으로는 해결하기 힘들었고, association rule analysis와 link analysis 기법이 보다 우수한 것으로 평가되었다.

  • PDF

텍스트 마이닝을 활용한 건설안전사고 빅데이터 분석 (Big Data Analytics of Construction Safety Incidents Using Text Mining)

  • 서정욱;송지훈
    • 한국산업융합학회 논문집
    • /
    • 제27권3호
    • /
    • pp.581-590
    • /
    • 2024
  • This study aims to extract key topics through text mining of incident records (incident history, post-incident measures, preventive measures) from construction safety accident case data available on the public data portal. It also seeks to provide fundamental insights contributing to the establishment of manuals for disaster prevention by identifying correlations between these topics. After pre-processing the input data, we used the LDA-based topic modeling technique to derive the main topics. Consequently, we obtained five topics related to incident history, and four topics each related to post-incident measures and preventive measures. Although no dominant patterns emerged from the topic pattern analysis, the study holds significance as it provides quantitative information on the follow-up actions related to the incident history, thereby suggesting practical implications for the establishment of a preventive decision-making system through the linkage between accident history and subsequent measures for reccurrence prevention.

Research Trends on Literature Reviews in Scopus Journals by Authors from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia: A Bibliometric Analysis from 2003 to 2022

  • Prakoso Bhairawa Putera;Amelya Gustina
    • Asian Journal of Innovation and Policy
    • /
    • 제12권3호
    • /
    • pp.304-322
    • /
    • 2023
  • Text data mining ('big data methods') is one of the most widely used approaches during the COVID-19 pandemic. In particular, text data mining on Scopus databases or Web of Science (WoS). Text data mining is widely used to collect literature for later bibliometric analysis, and in the end, it becomes a literature review article. Therefore, in this article, we reveal the trend of publication of literature reviews in Scopus journals from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia. This article describes two essential parts, namely 1) a comparison of international publication trends and subject area of literature review publications, and 2) a comparison of Top 5 for Authors, Affiliation, Source Title, and Collaboration Country.

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • 제3권2호
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

정치 도메인에서 신조어휘의 효과적인 추출 및 의미 분석에 대한 연구 (Study on Effective Extraction of New Coined Vocabulary from Political Domain Article and News Comment)

  • 이지현;김재홍;조예성;이민구;최혜봉
    • 문화기술의 융합
    • /
    • 제7권2호
    • /
    • pp.149-156
    • /
    • 2021
  • 정치적 사안에 대한 대중의 의견과 인식을 객관적으로 이해하기 위한 방법으로 텍스트 마이닝을 통한 빅데이터 분석을 수행할 수 있다. 기존 어휘 사전에 기반한 텍스트 마이닝 알고리즘은 신조어와 같이 사전에 수록되지 않은 어휘를 분석하는데 한계가 나타난다. SNS를 통해 나타나는 사용자들의 의견은 많은 경우 신조어와 비속어를 포함하는데, 이러한 어휘들을 효과적으로 분석하지 못한다면 정확한 대중의 인식과 의견을 파악하기 어렵게 된다. 본 논문은 정치 섹션의 뉴스 댓글로부터 정치적 의미성을 지니는 신조어와 비속어를 효과적으로 추출하는 방법을 제안하고, 추출한 신조어휘들의 의미와 맥락을 이해하기 위한 다양한 방법을 제시하였음.

Reorganizing Social Issues from R&D Perspective Using Social Network Analysis

  • Shun Wong, William Xiu;Kim, Namgyu
    • Journal of Information Technology Applications and Management
    • /
    • 제22권3호
    • /
    • pp.83-103
    • /
    • 2015
  • The rapid development of internet technologies and social media over the last few years has generated a huge amount of unstructured text data, which contains a great deal of valuable information and issues. Therefore, text mining-extracting meaningful information from unstructured text data-has gained attention from many researchers in various fields. Topic analysis is a text mining application that is used to determine the main issues in a large volume of text documents. However, it is difficult to identify related issues or meaningful insights as the number of issues derived through topic analysis is too large. Furthermore, traditional issue-clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be recognized using traditional issue-clustering methods, even if those issues are strongly related in other perspectives. Therefore, in this research, a methodology to reorganize social issues from a research and development (R&D) perspective using social network analysis is proposed. Using an R&D perspective lexicon, issues that consistently share the same R&D keywords can be further identified through social network analysis. In this study, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Issue clustering can then be performed based on the analysis results. Furthermore, the relationship between issues that share the same R&D keywords can be reorganized more systematically, by grouping them into clusters according to the R&D perspective lexicon. We expect that our methodology will contribute to establishing efficient R&D investment policies at the national level by enhancing the reusability of R&D knowledge, based on issue clustering using the R&D perspective lexicon. In addition, business companies could also utilize the results by aligning the R&D with their business strategy plans, to help companies develop innovative products and new technologies that sustain innovative business models.