• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.028 seconds

A Data-Driven Causal Analysis on Fatal Accidents in Construction Industry (건설 사고사례 데이터 기반 건설업 사망사고 요인분석)

  • Jiyoon Choi;Sihyeon Kim;Songe Lee;Kyunghun Kim;Sudong Lee
    • Journal of the Korea Safety Management & Science
    • /
    • v.25 no.3
    • /
    • pp.63-71
    • /
    • 2023
  • The construction industry stands out for its higher incidence of accidents in comparison to other sectors. A causal analysis of the accidents is necessary for effective prevention. In this study, we propose a data-driven causal analysis to find significant factors of fatal construction accidents. We collected 14,318 cases of structured and text data of construction accidents from the Construction Safety Management Integrated Information (CSI). For the variables in the collected dataset, we first analyze their patterns and correlations with fatal construction accidents by statistical analysis. In addition, machine learning algorithms are employed to develop a classification model for fatal accidents. The integration of SHAP (SHapley Additive exPlanations) allows for the identification of root causes driving fatal incidents. As a result, the outcome reveals the significant factors and keywords wielding notable influence over fatal accidents within construction contexts.

A Fully Distributed Secure Approach using Nondeterministic Encryption for Database Security in Cloud

  • Srinu Banothu;A. Govardhan;Karnam Madhavi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.1
    • /
    • pp.140-150
    • /
    • 2024
  • Database-as-a-Service is one of the prime services provided by Cloud Computing. It provides data storage and management services to individuals, enterprises and organizations on pay and uses basis. In which any enterprise or organization can outsource its databases to the Cloud Service Provider (CSP) and query the data whenever and wherever required through any devices connected to the internet. The advantage of this service is that enterprises or organizations can reduce the cost of establishing and maintaining infrastructure locally. However, there exist some database security, privacychallenges and query performance issues to access data, to overcome these issues, in our recent research, developed a database security model using a deterministic encryption scheme, which improved query execution performance and database security level.As this model is implemented using a deterministic encryption scheme, it may suffer from chosen plain text attack, to overcome this issue. In this paper, we proposed a new model for cloud database security using nondeterministic encryption, order preserving encryption, homomorphic encryptionand database distribution schemes, andour proposed model supports execution of queries with equality check, range condition and aggregate operations on encrypted cloud database without decryption. This model is more secure with optimal query execution performance.

CORRECT? CORECT!: Classification of ESG Ratings with Earnings Call Transcript

  • Haein Lee;Hae Sun Jung;Heungju Park;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.1090-1100
    • /
    • 2024
  • While the incorporating ESG indicator is recognized as crucial for sustainability and increased firm value, inconsistent disclosure of ESG data and vague assessment standards have been key challenges. To address these issues, this study proposes an ambiguous text-based automated ESG rating strategy. Earnings Call Transcript data were classified as E, S, or G using the Refinitiv-Sustainable Leadership Monitor's over 450 metrics. The study employed advanced natural language processing techniques such as BERT, RoBERTa, ALBERT, FinBERT, and ELECTRA models to precisely classify ESG documents. In addition, the authors computed the average predicted probabilities for each label, providing a means to identify the relative significance of different ESG factors. The results of experiments demonstrated the capability of the proposed methodology in enhancing ESG assessment criteria established by various rating agencies and highlighted that companies primarily focus on governance factors. In other words, companies were making efforts to strengthen their governance framework. In conclusion, this framework enables sustainable and responsible business by providing insight into the ESG information contained in Earnings Call Transcript data.

A Study on the Effective Command Delivery of Commanders Using Speech Recognition Technology (국방 분야에서 전장 소음 환경 하에 음성 인식 기술 연구)

  • Yeong-hoon Kim;Hyun Kwon
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.161-165
    • /
    • 2024
  • Recently, speech recognition models have been advancing, accompanied by the development of various speech processing technologies to obtain high-quality data. In the defense sector, efforts are being made to integrate technologies that effectively remove noise from speech data in noisy battlefield situations and enable efficient speech recognition. This paper proposes a method for effective speech recognition in the midst of diverse noise in a battlefield scenario, allowing commanders to convey orders. The proposed method involves noise removal from noisy speech followed by text conversion using OpenAI's Whisper model. Experimental results show that the proposed method reduces the Character Error Rate (CER) by 6.17% compared to the existing method that does not remove noise. Additionally, potential applications of the proposed method in the defense are discussed.

Applying NIST AI Risk Management Framework: Case Study on NTIS Database Analysis Using MAP, MEASURE, MANAGE Approaches (NIST AI 위험 관리 프레임워크 적용: NTIS 데이터베이스 분석의 MAP, MEASURE, MANAGE 접근 사례 연구)

  • Jung Sun Lim;Seoung Hun, Bae;Taehoon Kwon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.21-29
    • /
    • 2024
  • Fueled by international efforts towards AI standardization, including those by the European Commission, the United States, and international organizations, this study introduces a AI-driven framework for analyzing advancements in drone technology. Utilizing project data retrieved from the NTIS DB via the "drone" keyword, the framework employs a diverse toolkit of supervised learning methods (Keras MLP, XGboost, LightGBM, and CatBoost) enhanced by BERTopic (natural language analysis tool). This multifaceted approach ensures both comprehensive data quality evaluation and in-depth structural analysis of documents. Furthermore, a 6T-based classification method refines non-applicable data for year-on-year AI analysis, demonstrably improving accuracy as measured by accuracy metric. Utilizing AI's power, including GPT-4, this research unveils year-on-year trends in emerging keywords and employs them to generate detailed summaries, enabling efficient processing of large text datasets and offering an AI analysis system applicable to policy domains. Notably, this study not only advances methodologies aligned with AI Act standards but also lays the groundwork for responsible AI implementation through analysis of government research and development investments.

A Study on the Development of LDA Algorithm-Based Financial Technology Roadmap Using Patent Data

  • Koopo KWON;Kyounghak LEE
    • Korean Journal of Artificial Intelligence
    • /
    • v.12 no.3
    • /
    • pp.17-24
    • /
    • 2024
  • This study aims to derive a technology development roadmap in related fields by utilizing patent documents of financial technology. To this end, patent documents are extracted by dragging technical keywords from prior research and related reports on financial technology. By applying the TF-IDF (Term Frequency-Inverse Document Frequency) technique in the extracted patent document, which is a text mining technique, to the extracted patent documents, the Latent Dirichlet Allocation (LDA) algorithm was applied to identify the keywords and identify the topics of the core technologies of financial technology. Based on the proportion of topics by year, which is the result of LDA, promising technology fields and convergence fields were identified through trend analysis and similarity analysis between topics. A first-stage technology development roadmap for technology field development and a second-stage technology development roadmap for convergence were derived through network analysis about the technology data-based integrated management system of the high-dimensional payment system using RF and intelligent cards, as well as the security processing methodology for data information and network payment, which are identified financial technology fields. The proposed method can serve as a sufficient reason basis for developing financial technology R&D strategies and technology roadmaps.

Applied Practices on Digital Historical Data Transformation based on Intangible Cultural Heritage with Metaverse Approach

  • Hyeon-Uk Jeong;Janghwan Kim;Jihoon Kong;R. Young Chul Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.279-286
    • /
    • 2024
  • The preservation and transmission of intangible cultural heritage, such as traditional martial arts, have historically relied on manual processes that are both resource-intensive and costly. Due to budget limitations, many of these cultural assets are at risk of deterioration or remain hidden in museum storage, inaccessible to the public. To address these challenges, we propose a Digital Historical Data Transformation mechanism utilizing metaverse development techniques. This innovative approach converts 2D images into 3D representations, allowing for the extraction and visualization of associated actions in a three-dimensional space. By applying this methodology to the "Muyedobotongji," a classic text on traditional martial arts, we aim to digitally preserve these practices in a way that is both immersive and interactive. The transformation of static 2D images into dynamic 3D visualizations will not only enhance the restoration process but also make these cultural assets more accessible and engaging for future generations. This digital approach promises a more efficient and sustainable means of preserving intangible cultural heritage, ensuring that these traditions continue to thrive in the modern world.

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

Identify the Failure Mode of Weapon System (or equipment) using Machine Learning (Machine Learning을 이용한 무기 체계(or 구성품) 고장 유형 식별)

  • Park, Yun-Kyung;Lee, Hye-Won;Kim, Sang-Moon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.64-70
    • /
    • 2018
  • The development of weapon systems (or components) is hindered by the number of tests due to the limited development period and cost, which reduces the scale of accumulated data related to failures. Nevertheless, because a large amount of failure data and maintenance details during the operational period are managed by computerized data, the cause of failure of weapon systems (or components) can be analyzed using the data. On the other hand, analyzing the failure and maintenance details of various weapon systems is difficult because of the variation among groups and companies, and details of the cause of failure are described as unstructured text data. Fortunately, the recent developments of big data processing technology, machine learning algorithm, and improved HW computation ability have supported major research into various methods for processing the above unstructured data. In this paper, unstructured data related to the failure / maintenance of defense weapon systems (or components) is presented by applying doc2vec, a machine learning technique, to analyze the failure cases.

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.