• Title/Summary/Keyword: 비정형데이터분석

Search Result 405, Processing Time 0.032 seconds

Security tendency analysis techniques through machine learning algorithms applications in big data environments (빅데이터 환경에서 기계학습 알고리즘 응용을 통한 보안 성향 분석 기법)

  • Choi, Do-Hyeon;Park, Jung-Oh
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.269-276
    • /
    • 2015
  • Recently, with the activation of the industry related to the big data, the global security companies have expanded their scopes from structured to unstructured data for the intelligent security threat monitoring and prevention, and they show the trend to utilize the technique of user's tendency analysis for security prevention. This is because the information scope that can be deducted from the existing structured data(Quantify existing available data) analysis is limited. This study is to utilize the analysis of security tendency(Items classified purpose distinction, positive, negative judgment, key analysis of keyword relevance) applying the machine learning algorithm($Na{\ddot{i}}ve$ Bayes, Decision Tree, K-nearest neighbor, Apriori) in the big data environment. Upon the capability analysis, it was confirmed that the security items and specific indexes for the decision of security tendency could be extracted from structured and unstructured data.

Multidimensional Analysis of Unstructured Data and Trends in Architectural Review Opinions of Small and Medium-Sized Apartment Projects (다차원 분석방법을 활용한 중소규모 공동주택 건축심의 의견의 경향과 비정형 데이터로서의 특성분석)

  • Kim, Jinhee;Hwang, Taeeon;Kim, Jae-Sik;Huh, Youngki
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.6
    • /
    • pp.74-80
    • /
    • 2023
  • This study examines the characteristics of architectural review opinions as unstructured data, focusing on the most challenging risk for developers of small and medium-sized apartment projects in response to the increasing number of single-person households in Korea. Using multidimensional analysis methods, the study analyzes the review opinions of 25 projects in B City. Correspondence analysis and MDS (Multidimensional Scale) analysis show that, consistent with prior research, the keywords related to 'structure' and 'planning' dominate architectural review opinions in B City. While the MDS model's stress is very poor at 34.4%, correspondence analysis reveals that this is due to the characteristics of unstructured data in architectural reviews. In addition, the non-structured data analyzed in this study, such as architectural review opinions, exhibited a probability distribution with low kurtosis and high skewness, as they involved various combinations and occurrences of data depending on the discretion of the review committee members and the specific formats of different local governments. This often led to the emergence of keywords that differed significantly from commonly mentioned terms. Although the study has some limitations, it provides a foundation for future detailed analysis by identifying the characteristics of architectural review opinions as unstructured data.

Unstructured Data based a Study of Effectiveness about Prediction of Corporate Bankruptcy with a Real Case (실제 사례 기반 비정형 데이터를 활용한 기업의 부실징후 예측에 관한 효용성 연구)

  • JIN, Hoon;Hong, Jeoung-Pyo;Lee, Kang-Ho;Joo, Dong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.487-492
    • /
    • 2018
  • 4차산업 혁명의 여파로 국내에서는 다양한 분야에 인공지능과 빅데이터 기술을 활용하여 이전에 시행 중인 다양한 서비스 분야에 기술적 접목과 보완을 시도하고 있다. 특히 금융권에서 자금을 빌린 기업들을 대상으로 여신 안정성을 확보하고 선제적인 대응을 위해 온라인 뉴스기사들과 SNS 데이터 등을 이용하여 부실가능성을 예측하고 실제 업무에 도입하려는 시도들이 국내 주요 은행들을 중심으로 활발히 진행 중이다. 우리는 국내의 국책은행에서 수행한 비정형 데이터 기반의 기업의 부실징후 예측 시스템 개발 과정에서 시도된 다양한 분석 방법과 결과 그리고 과정 중에 발생한 문제점들에 관해 기술하고 관련 이슈들에 관하여 다룬다. 결과적으로 본 논문은 레이블이 없는 대량의 기사들에 레이블을 달기 위한 자동 태거(tagger) 개발과 뉴스 기사 예측 결과로부터 부실 가능성을 예측하기 위한 모델 및 성능 면에서 기사 예측 정확도 92%(AUC 0.96) 및 부실 가능성 기업 예측에서도 정형 데이터 분석결과에 견줄만한 성과를 이루었고 이에 관해 보고한다.

  • PDF

Design of Streaming based Unstructured-Data Collecting Framework in IoT Environment (IoT 환경에서 스트리밍 기반의 비정형 데이터 수집 프레임워크 설계)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.57-58
    • /
    • 2017
  • 사물인터넷 환경의 다양한 기기에서는 매초마다 시스템 로그 데이터, 온도, 습도, 조도 및 위치 정보 등과 같은 데이터를 지속적으로 생성한다. 이렇게 생성된 데이터는 기기 안에서 대부분 소멸되거나 수집된다 하더라도 시스템 개선의 일부 목적으로 활용하는데 그칠 뿐이다. 본 논문에서는 각각의 사물인터넷 기기에서 발생하는 비정형 데이터를 스트리밍 방식을 통해 수집 서버로 전송하고 이를 유연한 스키마 구조를 가지는 NoSQL 데이터베이스에 적재하는 프레임워크 설계를 제안한다. 이렇게 수많은 장비로부터 수집된 로그 및 센싱 데이터는 빅데이터 분석을 통해 산업의 현장에서 생산성 향상을 위해 사용할 수 있으며 공공의 목적으로 도심지의 교통문제 해소와 재난 및 재해 예측에 활용될 수 있다.

  • PDF

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Development of Machine Learning-based Construction Accident Prediction Model Using Structured and Unstructured Data of Construction Sites (건설현장 정형·비정형데이터를 활용한 기계학습 기반의 건설재해 예측 모델 개발)

  • Cho, Mingeon;Lee, Donghwan;Park, Jooyoung;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.1
    • /
    • pp.127-134
    • /
    • 2022
  • Recently, policies and research to prevent increasing construction accidents have been actively conducted in the domestic construction industry. In previous studies, the prediction model developed to prevent construction accidents mainly used only structured data, so various characteristics of construction sites are not sufficiently considered. Therefore, in this study, we developed a machine learning-based construction accident prediction model that enables the characteristics of construction sites to be considered sufficiently by using both structured and text-type unstructured data. In this study, 6,826 cases of construction accident data were collected from the Construction Safety Management Integrated Information (CSI) for machine learning. The Decision forest algorithm and the BERT language model were used to train structured and unstructured data respectively. As a result of analysis using both types of data, it was confirmed that the prediction accuracy was 95.41 %, which is improved by about 20 % compared to the case of using only structured data. Conclusively, the performance of the predictive model was effectively improved by using the unstructured data together, and construction accidents can be expected to be reduced through more accurate prediction.

A Study on Health Care Service Design for the Improvement of Cognitive Abilities of the Senior Citizens: Focusing on Unstructured Data Analysis (노인 인지능력 개선을 위한 헬스케어 서비스디자인 연구: 비정형 데이터 분석을 중심으로)

  • Seongho Kim;Hyeob Kim
    • Knowledge Management Research
    • /
    • v.23 no.4
    • /
    • pp.69-89
    • /
    • 2022
  • As we enter a super-aged society, senior citizens' health issues are affecting a variety of fields, including medicine, economics, society, and culture. In this study, we intend to draw implications from unstructured data analysis such as text mining and social network analysis in order to apply digital health care service design for improving the cognitive ability of senior citizens. The research procedure of this study improved the service design methodology into a process suited to the analysis of unstructured data, and six steps were applied. Related keywords that exist on social media, focusing on cognitive improvement and healthcare for senior citizens, were collected and analyzed, and based on these results, the direction of healthcare service design for improving on the cognitive abilities of senior citizens was derived. The results of this study are expected to have academic and practical implications for expanding the scope of the use of big data analysis methods and improving existing healthcare service development methodologies.

Construction Bid Data Analysis for Overseas Projects Based on Text Mining - Focusing on Overseas Construction Project's Bidder Inquiry (텍스트 마이닝을 통한 해외건설공사 입찰정보 분석 - 해외건설공사의 입찰자 질의(Bidder Inquiry) 정보를 대상으로 -)

  • Lee, JeeHee;Yi, June-Seong;Son, JeongWook
    • Korean Journal of Construction Engineering and Management
    • /
    • v.17 no.5
    • /
    • pp.89-96
    • /
    • 2016
  • Most data generated in construction projects is unstructured text data. Unstructured data analysis is very needed in order for effective analysis on large amounts of text-based documents, such as contracts, specifications, and RFI. This study analysed previously performed project's bid related documents (bidder inquiry) in overseas construction projects; as a results of the analysis frequent words in documents, association rules among the words, and various document topics were derived. This study suggests effective text analysis approach for massive documents with short time using text mining technique, and this approach is expected to extend the unstructured text data analysis in construction industry.

Multi-Dimensional Keyword Search and Analysis of Hotel Review Data Using Multi-Dimensional Text Cubes (다차원 텍스트 큐브를 이용한 호텔 리뷰 데이터의 다차원 키워드 검색 및 분석)

  • Kim, Namsoo;Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.1
    • /
    • pp.63-73
    • /
    • 2014
  • As the advance of WWW, unstructured data including texts are taking users' interests more and more. These unstructured data created by WWW users represent users' subjective opinions thus we can get very useful information such as users' personal tastes or perspectives from them if we analyze appropriately. In this paper, we provide various analysis efficiently for unstructured text documents by taking advantage of OLAP (On-Line Analytical Processing) multidimensional cube technology. OLAP cubes have been widely used for the multidimensional analysis for structured data such as simple alphabetic and numberic data but they didn't have used for unstructured data consisting of long texts. In order to provide multidimensional analysis for unstructured text data, however, Text Cube model has been proposed precently. It incorporates term frequency and inverted index as measurements to search and analyze text databases which play key roles in information retrieval. The primary goal of this paper is to apply this text cube model to a real data set from in an Internet site sharing hotel information and to provide multidimensional analysis for users' reviews on hotels written in texts. To achieve this goal, we first build text cubes for the hotel review data. By using the text cubes, we design and implement the system which provides multidimensional keyword search features to search and to analyze review texts on various dimensions. This system will be able to help users to get valuable guest-subjective summary information easily. Furthermore, this paper evaluats the proposed systems through various experiments and it reveals the effectiveness of the system.