• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.032 seconds

A View from the Bottom: Project-Oriented Risk Mining Approach for Overseas Construction Projects

  • Lee, JeeHee;Son, JeongWook;Yi, June-Seong
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.97-100
    • /
    • 2015
  • Analysis of construction tender documents in overseas projects is a very important issue from a risk management point of view. Unfortunately, majority of construction firms are biased by winning contracts without in-depth analysis of tender documents. As a result, many contractors have incurred loss in overseas projects. Although a lot of risk analysis techniques have been introduced, most of them focus project's external unexpected risks such as country conditions and owner's financial standing. However, because those external risks are difficult to control and take preemptive action, we need to concentrate on project inherent risks. Based on this premise, this paper proposes a project-oriented risk mining approach which could detect and extract project risk factors automatically before they are materialized and assess them. This study presents a methodology regarding how to extract potential risks which exist in owner's project requirements and project tender documents using state of the art data analysis method such as text mining, data mining, and information visualization. The project-oriented risk mining approach is expected to effectively reflect project characteristics to the project risk management and could provide construction firms with valuable business intelligence.

  • PDF

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R (빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1199-1205
    • /
    • 2021
  • In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

A Study on Educational Data Mining for Public Data Portal through Topic Modeling Method with Latent Dirichlet Allocation (LDA기반 토픽모델링을 활용한 공공데이터 기반의 교육용 데이터마이닝 연구)

  • Seungki Shin
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.439-448
    • /
    • 2022
  • This study aims to search for education-related datasets provided by public data portals and examine what data types are constructed through classification using topic modeling methods. Regarding the data of the public data portal, 3,072 cases of file data in the education field were collected based on the classification system. Text mining analysis was performed using the LDA-based topic modeling method with stopword processing and data pre-processing for each dataset. Program information and student-supporting notifications were usually provided in the pre-classified dataset for education from the data portal. On the other hand, the characteristics of educational programs and supporting information for the disabled, parents, the elderly, and children through the perspective of lifelong education were generally indicated in the dataset collected by searching for education. The results of data analysis through this study show that providing sufficient educational information through the public data portal would be better to help the students' data science-based decision-making and problem-solving skills.

패션디자인 DB 개발

  • 김정회
    • Proceedings of the Korea Database Society Conference
    • /
    • 1997.10a
    • /
    • pp.358-375
    • /
    • 1997
  • 가. 패션 디자인 기초 정보 수집/분석 - 국내외에 산재하는 패션디자인 정보의 기초자료를 입수 - 디자이너별/ 컬렉션별/주제별로 분류 - 가공 나- 패션디자인정보의 멀티미디어 DATA BASE개발 - 화상(IMAGE)/해설(TEXT)/ SOUND의 복합 DATA BASE SYSTEM - PC통신망 서비스를 위한 DATA개발 다. 패션디자인 관련자료의 DB화 - 패션디자인 이론서 - 패션디자인 컨테스트 / 이벤트 정보 - 패션디자인 교육기관 정보 - 패션브랜드 정보 (내셔널 / 디자이너 / 수입) 라. DATA BASE 공급 서비스 - PC통신망을 통한 서비스(DOWN LOAD 가능) - 디자인작품 IMAGE 및 CONCEPT/ DETAILS/ CAPTION - PC통신을 이용 디자인 인력 구인/구직정보 활용 - 패션디자인 해외유학 정보 마. Inter-NET 서비스 - Inter-NET을 이용 국내디자이너작품 해외 소개(중략)

  • PDF

Youth Social Networking Service (SNS) Behavior in Indonesian Culinary Activity

  • SAVILLE, Ramadhona;SATRIA, Hardika Widi;HAHIDUMARDJO, Harsono;ANSORI, Mukhlas
    • Journal of Distribution Science
    • /
    • v.18 no.4
    • /
    • pp.87-96
    • /
    • 2020
  • Purpose: In this paper, we provide an illustration of Indonesian youth Social Networking Service (SNS) behavior and its relation to their culinary activity. Specifically, their behavior of culinary activity preferences and also the factors affecting their action of spending their money. Data and methodology: We gathered primary data from stratified random questionnaire survey (406 youth). The gathered data was analyzed using text data mining and statistics using R statistical computing language. Results: 1) We found out why our respondents are interested in following the accounts of SNS food influencers: i.e. visually attracted to the posts, as their reference to find places to dine out, as their reference to try new food menu and to get nostalgic feeling about the food. 2) The respondents decide to actually go to the recommended culinary places because of several factors, specifically, its description (visual and text), location, word of mouth (WoM), the experience of being to that place and price. 3) Important factors affecting culinary spent are income, number of following food influencer account, SNS usage time and their interest when looking at WoM. Conclusions: SNS behavior influences Indonesian youth culinary activity preferences and spent.

Learning Probabilistic Kernel from Latent Dirichlet Allocation

  • Lv, Qi;Pang, Lin;Li, Xiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2527-2545
    • /
    • 2016
  • Measuring the similarity of given samples is a key problem of recognition, clustering, retrieval and related applications. A number of works, e.g. kernel method and metric learning, have been contributed to this problem. The challenge of similarity learning is to find a similarity robust to intra-class variance and simultaneously selective to inter-class characteristic. We observed that, the similarity measure can be improved if the data distribution and hidden semantic information are exploited in a more sophisticated way. In this paper, we propose a similarity learning approach for retrieval and recognition. The approach, termed as LDA-FEK, derives free energy kernel (FEK) from Latent Dirichlet Allocation (LDA). First, it trains LDA and constructs kernel using the parameters and variables of the trained model. Then, the unknown kernel parameters are learned by a discriminative learning approach. The main contributions of the proposed method are twofold: (1) the method is computationally efficient and scalable since the parameters in kernel are determined in a staged way; (2) the method exploits data distribution and semantic level hidden information by means of LDA. To evaluate the performance of LDA-FEK, we apply it for image retrieval over two data sets and for text categorization on four popular data sets. The results show the competitive performance of our method.

Self-Evolving Expert Systems based on Fuzzy Neural Network and RDB Inference Engine

  • Kim, Jin-Sung
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.19-38
    • /
    • 2003
  • In this research, we propose the mechanism to develop self-evolving expert systems (SEES) based on data mining (DM), fuzzy neural networks (FNN), and relational database (RDB)-driven forward/backward inference engine. Most researchers had tried to develop a text-oriented knowledge base (KB) and inference engine (IE). However, this approach had some limitations such as 1) automatic rule extraction, 2) manipulation of ambiguousness in knowledge, 3) expandability of knowledge base, and 4) speed of inference. To overcome these limitations, knowledge engineers had tried to develop an automatic knowledge extraction mechanism. As a result, the adaptability of the expert systems was improved. Nonetheless, they didn't suggest a hybrid and generalized solution to develop self-evolving expert systems. To this purpose, we propose an automatic knowledge acquisition and composite inference mechanism based on DM, FNN, and RDB-driven inference engine. Our proposed mechanism has five advantages. First, it can extract and reduce the specific domain knowledge from incomplete database by using data mining technology. Second, our proposed mechanism can manipulate the ambiguousness in knowledge by using fuzzy membership functions. Third, it can construct the relational knowledge base and expand the knowledge base unlimitedly with RDBMS (relational database management systems) module. Fourth, our proposed hybrid data mining mechanism can reflect both association rule-based logical inference and complicate fuzzy relationships. Fifth, RDB-driven forward and backward inference time is shorter than the traditional text-oriented inference time.

  • PDF

Using a Cellular Automaton to Extract Medical Information from Clinical Reports

  • Barigou, Fatiha;Atmani, Baghdad;Beldjilali, Bouziane
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.67-84
    • /
    • 2012
  • An important amount of clinical data concerning the medical history of a patient is in the form of clinical reports that are written by doctors. They describe patients, their pathologies, their personal and medical histories, findings made during interviews or during procedures, and so forth. They represent a source of precious information that can be used in several applications such as research information to diagnose new patients, epidemiological studies, decision support, statistical analysis, and data mining. But this information is difficult to access, as it is often in unstructured text form. To make access to patient data easy, our research aims to develop a system for extracting information from unstructured text. In a previous work, a rule-based approach is applied to a clinical reports corpus of infectious diseases to extract structured data in the form of named entities and properties. In this paper, we propose the use of a Boolean inference engine, which is based on a cellular automaton, to do extraction. Our motivation to adopt this Boolean modeling approach is twofold: first optimize storage, and second reduce the response time of the entities extraction.

Data Coding Scheme to Reduce Power Consumption and EMI in LCD Driving Systems (LCD 구동 시스템에서 전력 소비 및 전자기 장애를 줄이기 위한 데이타 코딩 방법)

  • Choi, Chul-Ho;Choi, Myung-Ryul
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.6
    • /
    • pp.628-634
    • /
    • 2000
  • We propose a data coding scheme for reducing' power consumption and ElVII in transmitting a sequence of data from LCD controller to LCD driver. The proposed coding scheme makes use of reducing data transitions in general text image of PC. It can be implemented with a little hardware and applied to the real-time applications of LCD driving system. We have executed computer simulations of the proposed coding scheme and compared the results of the proposed scheme with those produced by the existing coding schemes. The proposed coding scheme, compared to the existing ones, reduces the switching activity significantly in both of text and picture images.

  • PDF

A Study on the Development of the Use Index of Closed School Facilities Using Big Data -Focused on Text-Mining Techniques- (빅데이터를 활용한 폐교시설의 지표 개발에 관한 연구 -텍스트마이닝 기법을 중심으로-)

  • Kim, Jae-Young;Lee, Jong-Kuk
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.18 no.2
    • /
    • pp.1-11
    • /
    • 2019
  • The purpose of this study is to make objective decisions in the use of closed schools through the development of utilization indicators for the efficient use of closed schools, which is expected to increase continuously. The research phase was largely carried out by drawing preliminary indicators for use in closed schools, drawing final indicators using big data, and quantifying indicators, and finally objectifying them through quantification. The institution intends to apply and verify the facility based on future indicators. This study has implications for the application of big data analysis methods that have not been attempted in planning and research for the use of closed school facilities to date.