• Title/Summary/Keyword: 텍스트 데이터 분석

Search Result 1,112, Processing Time 0.034 seconds

Development of SVM-based Construction Project Document Classification Model to Derive Construction Risk (건설 리스크 도출을 위한 SVM 기반의 건설프로젝트 문서 분류 모델 개발)

  • Kang, Donguk;Cho, Mingeon;Cha, Gichun;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.841-849
    • /
    • 2023
  • Construction projects have risks due to various factors such as construction delays and construction accidents. Based on these construction risks, the method of calculating the construction period of the construction project is mainly made by subjective judgment that relies on supervisor experience. In addition, unreasonable shortening construction to meet construction project schedules delayed by construction delays and construction disasters causes negative consequences such as poor construction, and economic losses are caused by the absence of infrastructure due to delayed schedules. Data-based scientific approaches and statistical analysis are needed to solve the risks of such construction projects. Data collected in actual construction projects is stored in unstructured text, so to apply data-based risks, data pre-processing involves a lot of manpower and cost, so basic data through a data classification model using text mining is required. Therefore, in this study, a document-based data generation classification model for risk management was developed through a data classification model based on SVM (Support Vector Machine) by collecting construction project documents and utilizing text mining. Through quantitative analysis through future research results, it is expected that risk management will be possible by being used as efficient and objective basic data for construction project process management.

Research on the change of perception of abandoned dogs through big data analysis

  • Jang, Ji-Yun;Lee, Seok-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.115-123
    • /
    • 2021
  • This study aims to analyze the changes in public perception of abandoned dogs through big data analysis. Data from January 2017 to July 2020 were collected to analyze how the quantitative change in social issues with abandoned dogs as a keyword had an effect on public perception of abandoned dogs, and factors that influence positive/negative perceptions. As a result of the study, it was confirmed that the number of stray dogs and the number of documents related to stray dogs had a positive correlation, and specific time series changes were found through various analysis techniques such as text mining, network analysis, and sentiment analysis. This study will have significance as basic data that can be used for policy establishment or other research on abandoned dogs. we hope it will help to solve problems so as to improve awareness of abandoned dogs and develop a sense of responsibility.

Bio-Sensing Convergence Big Data Computing Architecture (바이오센싱 융합 빅데이터 컴퓨팅 아키텍처)

  • Ko, Myung-Sook;Lee, Tae-Gyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.43-50
    • /
    • 2018
  • Biometric information computing is greatly influencing both a computing system and Big-data system based on the bio-information system that combines bio-signal sensors and bio-information processing. Unlike conventional data formats such as text, images, and videos, biometric information is represented by text-based values that give meaning to a bio-signal, important event moments are stored in an image format, a complex data format such as a video format is constructed for data prediction and analysis through time series analysis. Such a complex data structure may be separately requested by text, image, video format depending on characteristics of data required by individual biometric information application services, or may request complex data formats simultaneously depending on the situation. Since previous bio-information processing computing systems depend on conventional computing component, computing structure, and data processing method, they have many inefficiencies in terms of data processing performance, transmission capability, storage efficiency, and system safety. In this study, we propose an improved biosensing converged big data computing architecture to build a platform that supports biometric information processing computing effectively. The proposed architecture effectively supports data storage and transmission efficiency, computing performance, and system stability. And, it can lay the foundation for system implementation and biometric information service optimization optimized for future biometric information computing.

Audio-Visual Scene Aware Dialogue System Utilizing Action From Vision and Language Features (이미지-텍스트 자질을 이용한 행동 포착 비디오 기반 대화시스템)

  • Jungwoo Lim;Yoonna Jang;Junyoung Son;Seungyoon Lee;Kinam Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.253-257
    • /
    • 2023
  • 최근 다양한 대화 시스템이 스마트폰 어시스턴트, 자동 차 내비게이션, 음성 제어 스피커, 인간 중심 로봇 등의 실세계 인간-기계 인터페이스에 적용되고 있다. 하지만 대부분의 대화 시스템은 텍스트 기반으로 작동해 다중 모달리티 입력을 처리할 수 없다. 이 문제를 해결하기 위해서는 비디오와 같은 다중 모달리티 장면 인식을 통합한 대화 시스템이 필요하다. 기존의 비디오 기반 대화 시스템은 주로 시각, 이미지, 오디오 등의 다양한 자질을 합성하거나 사전 학습을 통해 이미지와 텍스트를 잘 정렬하는 데에만 집중하여 중요한 행동 단서와 소리 단서를 놓치고 있다는 한계가 존재한다. 본 논문은 이미지-텍스트 정렬의 사전학습 임베딩과 행동 단서, 소리 단서를 활용해 비디오 기반 대화 시스템을 개선한다. 제안한 모델은 텍스트와 이미지, 그리고 오디오 임베딩을 인코딩하고, 이를 바탕으로 관련 프레임과 행동 단서를 추출하여 발화를 생성하는 과정을 거친다. AVSD 데이터셋에서의 실험 결과, 제안한 모델이 기존의 모델보다 높은 성능을 보였으며, 대표적인 이미지-텍스트 자질들을 비디오 기반 대화시스템에서 비교 분석하였다.

  • PDF

Sentiment Analysis of Foot-and-Mouth Disease Using Tweet Text-Mining Technique (트윗 텍스트 마이닝 기법을 이용한 구제역의 감성분석)

  • Chae, Heechan;Lee, Jonguk;Choi, Yoona;Park, Daihee;Chung, Yongwha
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.419-426
    • /
    • 2018
  • Due to the FMD(foot-and-mouth disease), the domestic animal husbandry and related industries suffer enormous damage every year. Although various academic researches related to FMD are ongoing, engineering studies on the social effects of FMD are very limited. In this study, we propose a systematic methodology to analyze emotional responses of regular citizens on FMD using text mining techniques. The proposed system first collects data related to FMD from the tweets posted on Twitter, and then performs a polarity classification process using a deep-learning technique. Second, keywords are extracted from the tweet using LDA, which is one of the typical techniques of topic modeling, and a keyword network is constructed from the extracted keywords. Finally, we analyze the various social effects of regular citizens on FMD through keyword network. As a case study, we performed the emotional analysis experiment of regular citizens about FMD from July 2010 to December 2011 in Korea.

Reliability Analysis of Privacy Policies Based on Android Static Analysis (안드로이드 정적분석 기반 개인정보 처리방침의 신뢰성 분석)

  • Jung, Yoonkyo
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.221-224
    • /
    • 2022
  • 모바일 사용자가 증가함에 따라 모바일 앱에서 사용자가 허용하지 않은 개인정보가 유출되는 프라이버시 문제가 많아졌다. 이를 해결하기 위해 구글은 앱스토어에 등록된 앱이 사용자의 개인정보를 어떻게 활용하는지 개인정보 처리방침에 명시하도록 했다. 하지만 개인정보 처리방침이 실제로 앱의 개인정보 수집 및 처리 과정을 정확히 공개하는지 확인할 수 있는 해결책이 없으며, 사용자는 앱이 개인정보를 어떻게 활용하는지 알기 위해 개인정보 처리방침에 의존해야만 한다. 본 연구에서는 안드로이드 정적 분석으로 앱이 접근할 수 있는 데이터를 확인하고, 개인정보 처리방침의 텍스트를 추출 및 분석한 뒤 결과를 비교하여 개인정보 처리방침의 신뢰성을 분석한다. 실험을 위해 구글 앱스토어에 등록된 13,223개 앱의 패키지 파일과 부가정보를 수집했고 전처리 과정을 거쳐 분석 가능한 앱을 선정했다. 선정한 앱의 모바일 앱 분석 결과와 텍스트 분석 결과를 비교하여 모바일 앱이 개인정보 처리방침에 명시된 것보다 더 많은 개인정보에 접근할 수 있음을 입증한다.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

A Study on the Data Analysis of the Written Comments in Lecture Evaluation (데이터분석을 이용한 서술형 강의평가 연구)

  • Choi, Jung-Woong;An, Dong-Kyu
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.101-106
    • /
    • 2016
  • A number of non-structured data associated with lectures in the field of university education have been generated and it is an important consideration of the students's written comments lecture evaluation. The purpose of this study is to find student interaction factors associated with the student evaluation of teaching at universities, and to provide some insights into improving the student evaluation program based on the results. So, this study consists of three steps that create interaction score, collect student's written comments satisfaction, and analyze an individual professor score. There are a number of limitations to this study. The limitation is that the study was conducted on a narrow sample of the overall student population.

Study on Effective Extraction of New Coined Vocabulary from Political Domain Article and News Comment (정치 도메인에서 신조어휘의 효과적인 추출 및 의미 분석에 대한 연구)

  • Lee, Jihyun;Kim, Jaehong;Cho, Yesung;Lee, Mingu;Choi, Hyebong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.149-156
    • /
    • 2021
  • Text mining is one of the useful tools to discover public opinion and perception regarding political issues from big data. It is very common that users of social media express their opinion with newly-coined words such as slang and emoji. However, those new words are not effectively captured by traditional text mining methods that process text data using a language dictionary. In this study, we propose effective methods to extract newly-coined words that connote the political stance and opinion of users. With various text mining techniques, I attempt to discover the context and the political meaning of the new words.

Trend Analysis of Apartments Demand based on Big Data (빅데이터 기반의 아파트 수요 트렌드 분석에 관한 연구)

  • Kim, Tae-Kyeong;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.18 no.6
    • /
    • pp.13-25
    • /
    • 2017
  • Apartments are a major type of residence and their number has continuously increased. Apartments have multiple meanings in that for public they are not only for residence purpose but for investment, a major commodity for construction firms and a critical policy measure of public well-fare for the government. Therefore, it is critical to understand and analyze trends in apartments demand for pro-active actions. The objective of the study is to analyze and identify key trends in apartments demand based on big data drawn from articles of major daily newspapers. The study identifies 17 major trends from seven themes including development, trade, sale in lots, location requirements, policy, residential environment, and investment and profit. The research methods in the study can be usefully applied to further studies for various issues in relation to the construction industry.