• Title/Summary/Keyword: analyzing unstructured data

Search Result 106, Processing Time 0.025 seconds

Unstructured Data Analysis and Multi-pattern Storage Technique for Traffic Information Inference (교통정보 추론을 위한 비정형데이터 분석과 다중패턴저장 기법)

  • Kim, Yonghoon;Kim, Booil;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.211-223
    • /
    • 2018
  • To understand the meaning of data is a common goal of research on unstructured data. Among these unstructured data, there are difficulties in analyzing the meaning of unstructured data related to corpus and sentences. In the existing researches, the researchers used LSA to select sentences with the most similar meaning to specific words of the sentences. However, it is problematic to examine many sentences continuously. In order to solve unstructured data classification problem, several search sites are available to classify the frequency of words and to serve to users. In this paper, we propose a method of classifying documents by using the frequency of similar words, and the frequency of non-relevant words to be applied as weights, and storing them in terms of a multi-pattern storage. We use Tensorflow's Softmax to the nearby sentences for machine learning, and utilize it for unstructured data analysis and the inference of traffic information.

An Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment

  • Kim, Myoungjin;Cui, Yun;Lee, Hanku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3182-3202
    • /
    • 2015
  • Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured log processing system (MdbULPS) for collecting, categorizing, and analyzing log data generated from banks. The proposed system includes a Hadoop-based analysis module for reliable parallel-distributed processing of massive log data. Furthermore, because the Hadoop distributed file system (HDFS) stores data by generating replicas of collected log data in block units, the proposed system offers automatic system recovery against system failures and data loss. Finally, by establishing a distributed database using the NoSQL-based MongoDB, the proposed system provides methods of effectively processing unstructured log data. To evaluate the proposed system, we conducted three different performance tests on a local test bed including twelve nodes: comparing our system with a MySQL-based approach, comparing it with an Hbase-based approach, and changing the chunk size option. From the experiments, we found that our system showed better performance in processing unstructured log data.

Product Recommendation System based on User Purchase Priority

  • Bang, Jinsuk;Hwang, Doyeun;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.55-60
    • /
    • 2020
  • As personalized customer services create a society that emphasizes the personality of an individual, the number of product reviews and quantity of user data generated by users on the internet in mobile shopping apps and sites are increasing. Such product review data are classified as unstructured data. Unstructured data have the potential to be transformed into information that companies and users can employ, using appropriate processing and analyses. However, existing systems do not reflect the detailed information they collect, such as user characteristics, purchase preference, or purchase priority while analyzing review data. Thus, it is challenging to provide customized recommendations for various users. Therefore, in this study, we have developed a product recommendation system that takes into account the user's priority, which they select, when searching for and purchasing a product. The recommendation system then displays the results to the user by processing and analyzing their preferences. Since the user's preference is considered, the user can obtain results that are more relevant.

Cost Performance Evaluation Framework through Analysis of Unstructured Construction Supervision Documents using Binomial Logistic Regression (비정형 공사감리문서 정보와 이항 로지스틱 회귀분석을 이용한 건축 현장 비용성과 평가 프레임워크 개발)

  • Kim, Chang-Won;Song, Taegeun;Lee, Kiseok;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.121-131
    • /
    • 2024
  • This research explores the potential of leveraging unstructured data from construction supervision documents, which contain detailed inspection insights from independent third-party monitors of building construction processes. With the evolution of analytical methodologies, such unstructured data has been recognized as a valuable source of information, offering diverse insights. The study introduces a framework designed to assess cost performance by applying advanced analytical methods to the unstructured data found in final construction supervision reports. Specifically, key phrases were identified using text mining and social network analysis techniques, and these phrases were then analyzed through binomial logistic regression to assess cost performance. The study found that predictions of cost performance based on unstructured data from supervision documents achieved an accuracy rate of approximately 73%. The findings of this research are anticipated to serve as a foundational resource for analyzing various forms of unstructured data generated within the construction sector in future projects.

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

A study on the analysis of unstructured data for customized education of learners in small learning groups (소규모학습그룹의 학습자 맞춤형 교육을 위한 비정형데이터분석 연구)

  • Min, Youn-A;Lim, Dong-Kyun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.89-95
    • /
    • 2020
  • As the e-learning market expands, interest in customized education for learners based on artificial intelligence is increasing. Customized education for learners requires essential components such as a large amount of data and learning contents for learner analysis, and it requires time and cost efforts to collect such data. In this paper, to enable efficient learner-tailored learning even in small learning groups, unstructured learner data was analyzed using python modules, and a learning algorithm was presented based on this. Through the analysis of the unstructured learning data presented in this paper, it is possible to quantify and measure the unstructured data related to learning, and the accuracy of more than 80% was confirmed when analyzing keywords for providing customized education for learners.

Survey-based unstructured data analysis to predict flipped learning performance (플립드러닝 성과를 예측하기 위한 설문조사 기반의 비정형 데이터 분석)

  • Chayoung Kim;Yoon Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.519-524
    • /
    • 2023
  • The study summarizes the experience of operation in the application of flipped learning to various IT-related liberal arts subjects, and proposes a specific application method. So far, most of the studies have analyzed various strategies and learner responses to flipped learning. Currently, it is the time when teachers, who are the main operators of the flipped learning class, need to study how to provide immediate feedback and application while running the relevant courses. Studies related to this are gradually coming out. In general, most of the studies on sharing reference materials through the results after applying various strategies such as developing the structure of class operation by instructors themselves, combining them with discussion classes, or developing various contents. This study proposes a method to analyze how various strategies can be applied in the subject and obtain results simultaneously with class operation by analyzing unstructured data, which is a survey that can receive immediate feedback.