• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

Adoption of Virtual Technology to the Development of a BIM based PMIS

  • Suh, Bong-Gyo;Lee, Ghang;Yun, Seok-Heon
    • Journal of the Korea Institute of Building Construction
    • /
    • v.13 no.4
    • /
    • pp.333-340
    • /
    • 2013
  • As construction projects become bigger, PMIS is being used as a project collaboration tool for project participants, owners, designers, inspectors and contractors. As the data type used in PMIS is usually text and most PMIS have no standard information classification system, there is a problem with data usability, such as the capacity for data search and analysis. BIM uses Objects and Properties, and this information might be used for relating with other construction information. As such, BIM technologies can be used with PMIS to enhance the data usability. The web environment is very convenient for multiple users, but the problem is that the data transfer speed is low for big files such as BIM model files. In this study, we suggested a Virtual Technology (VT) application to enhance the performance of BIM data exchange in PMIS, and tested and analyzed its efficiency when it is used to integrate BIM and PMIS in the web environment. The results of the study showed that VT can be used to enhance the efficiency of BIM data exchange in the web environment.

For airline preferences of consumers Big Data Convergence Based Marketing Strategy (소비자의 항공사 선호도에 대한 빅데이터 융합 기반 마케팅 전략)

  • Chun, Yong-Ho;Lee, Seung-Joon;Park, Su-Hyeon
    • Journal of Industrial Convergence
    • /
    • v.17 no.3
    • /
    • pp.17-22
    • /
    • 2019
  • As the value of big data is recognized as important, it is possible to advance decision making by effectively introducing and improving the development and utilization of JAVA and R programs that can analyze vast amounts of existing and unstructured data to governments, public institutions and private businesses. In this study, news data was collated and analyzed through text mining techniques in order to establish marketing strategies based on consumers' airline preferences. This research is meaningful in establishing marketing strategies based on analysis results by analyzing consumers' airline preferences using high-level big data utilization program techniques for data that were difficult to obtain in the past.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

Text mining-based Data Preprocessing and Accident Type Analysis for Construction Accident Analysis (건설사고 분석을 위한 텍스트 마이닝 기반 데이터 전처리 및 사고유형 분석)

  • Yoon, Young Geun;Lee, Jae Yun;Oh, Tae Keun
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.2
    • /
    • pp.18-27
    • /
    • 2022
  • Construction accidents are difficult to prevent because several different types of activities occur simultaneously. The current method of accident analysis only indicates the number of occurrences for one or two variables and accidents have not reduced as a result of safety measures that focus solely on individual variables. Even if accident data is analyzed to establish appropriate safety measures, it is difficult to derive significant results due to a large number of data variables, elements, and qualitative records. In this study, in order to simplify the analysis and approach this complex problem logically, data preprocessing techniques, such as latent class cluster analysis (LCCA) and predictor importance were used to discover the most influential variables. Finally, the correlation was analyzed using an alluvial flow diagram consisting of seven variables and fourteen elements based on accident data. The alluvial diagram analysis using reduced variables and elements enabled the identification of accident trends into four categories. The findings of this study demonstrate that complex and diverse construction accident data can yield relevant analysis results, assisting in the prevention of accidents.

An Analysis of the Hocance Phenomenon using Social Media Big Data (소셜 미디어 빅데이터를 활용한 호캉스(hocance) 현상 분석)

  • Choi, Hong-Yeol;Park, Eun-Kyung;Nam, Jang-Hyeon
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.2
    • /
    • pp.161-174
    • /
    • 2021
  • Purpose - The purpose of this study was to examine the recent popular consumption trend, the hocance phenomenon, using social media big data. The study intended to present practical directions and marketing measures for the recovery and growth of the hotel industry after COVID-19 pandemic. Design/methodology/approach - Big data analysis has been used in various fields, and in this study, it was used to understand the hocance phenomenon. For three years from January 1, 2018 to December 31, 2020, we collected text data including the keyword 'hocance' from the blog and cafe of NAVER and Daum. TEXTOM and UCINET 6 were used to collect and analyze the data. Findings - According to the results of analysis, the words such as 'hocance', 'hotel', 'Seoul', 'travel', 'swimming pool', 'Incheon', 'breakfast', 'child' and 'friend' were identified with high frequency. The results of CONCOR analysis showed similar results in all three years. It has been confirmed that 'swimming pool', 'breakfast', 'child' and 'friend' are important when deciding on the hocance package. Research implications or Originality - The study was differentiated in that it used social media big data instead of traditional research methods. Furthermore, it reflected social phenomena as a consumption trend so there was practical value in establishing marketing strategies for the tourism and hotel industry.

Consolidation of Subtasks for Target Task in Pipelined NLP Model

  • Son, Jeong-Woo;Yoon, Heegeun;Park, Seong-Bae;Cho, Keeseong;Ryu, Won
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.704-713
    • /
    • 2014
  • Most natural language processing tasks depend on the outputs of some other tasks. Thus, they involve other tasks as subtasks. The main problem of this type of pipelined model is that the optimality of the subtasks that are trained with their own data is not guaranteed in the final target task, since the subtasks are not optimized with respect to the target task. As a solution to this problem, this paper proposes a consolidation of subtasks for a target task ($CST^2$). In $CST^2$, all parameters of a target task and its subtasks are optimized to fulfill the objective of the target task. $CST^2$ finds such optimized parameters through a backpropagation algorithm. In experiments in which text chunking is a target task and part-of-speech tagging is its subtask, $CST^2$ outperforms a traditional pipelined text chunker. The experimental results prove the effectiveness of optimizing subtasks with respect to the target task.

The Analysis of Inquisitive Tendency of 'Information Science and Information Ethics' Section in 2009 Revised Middle School Informatics Textbooks (2009 개정 중학교 '정보' 교과서에서 '정보과학과 정보윤리' 영역의 탐구적 경향 분석)

  • Kang, Oh-Han
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.2
    • /
    • pp.1-9
    • /
    • 2014
  • In this paper, we study inquiry tendency in informatics textbooks for middle school students. These textbooks were written in accordance with 2009 revised national curriculum, and were adopted in schools starting from 2013. Romey analysis was implemented to analyze inquiry tendency in six textbooks using four different criteria-text, data, activity, evaluation. The results showed that some textbooks displayed a low level of inquiry tendency, lacking participation from students and employing an authoritative tone. Among four factors, the 'activity' displayed the highest degree of inquiry tendency while the 'text' retained the lowest level of inquiry tendency. The Romey measures in some sections were zero, thereby indicating room for improvement.

  • PDF

Table Structure Recognition in Images for Newspaper Reader Application for the Blind (시각 장애인용 신문 구독 프로그램을 위한 이미지에서 표 구조 인식)

  • Kim, Jee Woong;Yi, Kang;Kim, Kyung-Mi
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.11
    • /
    • pp.1837-1851
    • /
    • 2016
  • Newspaper reader mobile applications using text-to-speech (TTS) function enable blind people to read newspaper contents. But, tables cannot be easily read by the reader program because most of the tables are stored as images in the contents. Even though we try to use OCR (Optical character reader) programs to recognize letters from the table images, it cannot be simply applied to the table reading function because the table structure is unknown to the readers. Therefore, identification of exact location of each table cell that contains the text of the table is required beforehand. In this paper, we propose an efficient image processing algorithm to recognize all the cells in tables by identifying columns and rows in table images. From the cell location data provided by the table column and row identification algorithm, we can generate table structure information and table reading scenarios. Our experimental results with table images found commonly in newspapers show that our cell identification approach has 100% accuracy for simple black and white table images and about 99.7% accuracy for colored and complicated tables.

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

A Pattern Matching Method of Large-Size Text Log Data using In-Memory Relational Database System (인메모리 관계형 데이터베이스 시스템을 이용한 대용량 텍스트 로그 데이터의 패턴 매칭 방법)

  • Han, Hyeok;Choi, Jae-Yong;Jin, Sung-Il
    • Annual Conference of KIPS
    • /
    • 2017.04a
    • /
    • pp.837-840
    • /
    • 2017
  • 각종 사이버 범죄가 증가함에 따라 실시간 모니터링을 통한 사전 탐지 기술뿐만 아니라, 사후 원인 분석을 통한 사고 재발 방지 기술의 중요성이 증가하고 있다. 사후 분석은 시스템에서 생산된 다양한 유형의 대용량 로그를 기반으로 분석가가 보안 위협 과정을 규명하는 것으로 이를 지원하는 다양한 상용 및 오픈 소스 SW 존재하나, 대부분 단일 분석가 PC에서 운용되는 파일 기반 SW로 대용량 데이터에 대한 분석 성능 저하, 다수 분석가 간의 데이터 공유 불가, 통계 연관 분석 한계 및 대화형 점진적 내용 분석 불가 등의 문제점을 해결하지 못하고 있다. 이러한 문제점을 해결하기 위하여 고성능 인메모리 관계형 데이터베이스 시스템을 로그 스토리지로 활용하는 대용량 로그 분석 SW 개발하였다. 특히, 기 확보된 공격자 프로파일을 활용하여 공격의 유무를 확인하는 텍스트 패턴 매칭 연산은 전통적인 관계형 데이터베이스 시스템의 FTS(Full-Text Search) 기능 활용이 가능하나, 대용량 전용 색인 생성에 따른 비현실적인 DB 구축 소요 시간과 최소 3배 이상의 DB 용량 증가로 인한 시스템 리소스 추가 요구 등의 단점이 있다. 본 논문에서는 인메모리 관계형 데이터베이스 시스템 기반 효율적인 텍스트 패턴 매칭 연산을 위하여, 고성능의 대용량 로그 DB 적재 방법과 새로운 유형의 패턴 매칭 방법을 제안하였다.