• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.027 seconds

Design and Implementation of XML Web Agent for Data Exchange and Replication between Heterogeneous DBMSs (이기종 DBMS간 데이터 교환과 복제를 위한 XML 웹 에이전트 설계 및 구현)

  • Yu, Sun-Young;Lee, Chun-Keun;Yim, Jae-Hong
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.967-975
    • /
    • 2004
  • HTML is unstructured document because of using restricted tag. HTML is difficult to extract data from HTML document. But XML is able to use user definition tag, that is easy to store information. Also XML is easy to extract data from XML document. This is the reason why XML is a standard for data exchange format on the Internet, so XML is fitted to exchange data between heterogeneous DBMSs(DataBase Management System). In this paper, we designed and implemented of XML web agent for data replication between heterogeneous DBMSs. A XML web agent system controls data of DBMS, and generates a XML document from data of DBMS. Also XML web agent is data exchange or replication between heterogeneous DBMS by the medium of XML.

  • PDF

Subnet Selection Scheme based on probability to enhance process speed of Big Data (빅 데이터의 처리속도 향상을 위한 확률기반 서브넷 선택 기법)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.201-208
    • /
    • 2015
  • With services such as SNS and facebook, Big Data popularize the use of small size such as micro blogs are increasing. However, the problem of accuracy and computational cost of the search result of big data of a small size is unresolved. In this paper, we propose a subnet selection techniques based probability to improve the browsing speed of the small size of the text information from big data environments, such as micro-blogs. The proposed method is to configure the subnets to give to the attribute information of the data increased the probability data search speed. In addition, the proposed method improves the accessibility of the data by processing a pair of the connection information between the probability of the data constituting the subnet to easily access the distributed data. Experimental results showed the proposed method is 6.8% higher detection rates than CELF algorithm, the average processing time was reduced by 8.2%.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

Analysis of the Core Concepts of Middle School Informatics Textbook Using Big Data Analysis Techniques (빅데이터 분석 방법을 이용한 중학교 정보 교과서 핵심 개념 분석)

  • Woon, Daewoong;Choe, Hyunjong
    • Journal of Creative Information Culture
    • /
    • v.5 no.2
    • /
    • pp.157-164
    • /
    • 2019
  • Big data is a field that has been utilized and developed in various fields in our society recently. Big data analysis techniques are frequently used to analyze various big data in various fields of politics, economy, and society to grasp various meanings hidden in the data. However, big data analysis is used some case studies of in fields of analysis of educational data, but analysis of the curriculum and direction is still inadequate. Therefore, this study aims to identify and analyze the core concepts of middle school informatics textbooks using big data analysis techniques. Text mining was used for big data analysis for informatics textbook analysis. Through the core concepts of middle school informatics textbooks identified using this techniques, we could confirm the concepts to be emphasized in the textbooks and the possibility of using big data in the field of education.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Big Data using Artificial Intelligence CNN on Unstructured Financial Data (비정형 금융 데이터에 관한 인공지능 CNN 활용 빅데이터 연구)

  • Ko, Young-Bong;Park, Dea-Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.232-234
    • /
    • 2022
  • Big data is widely used in customer relationship management, relationship marketing, financial business improvement, credit information and risk management. Moreover, as non-face-to-face financial transactions have become more active recently due to the COVID-19 virus, the use of financial big data is more demanded in terms of relationships with customers. In terms of customer relationship, financial big data has arrived at a time that requires an emotional rather than a technical approach. In relational marketing, it was necessary to emphasize the emotional aspect rather than the cognitive, rational, and rational aspects. Existing traditional financial data was collected and utilized through text-type customer transaction data, corporate financial information, and questionnaires. In this study, the customer's emotional image data, that is, atypical data based on the customer's cultural and leisure activities, is acquired through SNS and the customer's activity image is analyzed with an artificial intelligence CNN algorithm. Activity analysis is again applied to the annotated AI, and the AI big data model is designed to analyze the behavior model shown in the annotation.

  • PDF

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

A Time Series Analysis of Urban Park Behavior Using Big Data (빅데이터를 활용한 도시공원 이용행태 특성의 시계열 분석)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.1
    • /
    • pp.35-45
    • /
    • 2020
  • This study focused on the park as a space to support the behavior of urban citizens in modern society. Modern city parks are not spaces that play a specific role but are used by many people, so their function and meaning may change depending on the user's behavior. In addition, current online data may determine the selection of parks to visit or the usage of parks. Therefore, this study analyzed the change of behavior in Yeouido Park, Yeouido Hangang Park, and Yangjae Citizen's Forest from 2000 to 2018 by utilizing a time series analysis. The analysis method used Big Data techniques such as text mining and social network analysis. The summary of the study is as follows. The usage behavior of Yeouido Park has changed over time to "Ride" (Dynamic Behavior) for the first period (I), "Take" (Information Communication Service Behavior) for the second period (II), "See" (Communicative Behavior) for the third period (III), and "Eat" (Energy Source Behavior) for the fourth period (IV). In the case of Yangjae Citizens' Forest, the usage behavior has changed over time to "Walk" (Dynamic Behavior) for the first, second, and third periods (I), (II), (III) and "Play" (Dynamic Behavior) for the fourth period (IV). Looking at the factors affecting behavior, Yeouido Park was had various factors related to sports, leisure, culture, art, and spare time compared to Yangjae Citizens' Forest. The differences in Yangjae Citizens' Forest that affected its main usage behavior were various elements of natural resources. Second, the behavior of the target areas was found to be focused on certain main behaviors over time and played a role in selecting or limiting future behaviors. These results indicate that the space and facilities of the target areas had not been utilized evenly, as various behaviors have not occurred, however, a certain main behavior has appeared in the target areas. This study has great significance in that it analyzes the usage of urban parks using Big Data techniques, and determined that urban parks are transformed into play spaces where consumption progressed beyond the role of rest and walking. The behavior occurring in modern urban parks is changing in quantity and content. Therefore, through various types of discussions based on the results of the behavior collected through Big Data, we can better understand how citizens are using city parks. This study found that the behavior associated with static behavior in both parks had a great impact on other behaviors.

Urban Landscape Image Study by Text Mining and Factor Analysis - Focused on Lotte World Tower - (텍스트 마이닝과 인자분석에 의한 도시경관이미지 연구 - 롯데월드타워를 대상으로 -)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.45 no.4
    • /
    • pp.104-117
    • /
    • 2017
  • This study compares the results of landscape image analysis using text mining techniques and factor analysis for Lotte World Tower, which is the first atypical skyscraper building in Korea, and identifies landscape images of the site to determine possibilities of use. Lotte World Tower's landscape image has been extracted from text mining analysis focusing on adjectives such as 'new', 'transformational', 'unusual', 'novelty', 'impressive', and 'unique', and phrases such as in the process of change, people's active elements(caliber, outing, project, night view), media(newspaper, blog), and climate(weather, season). As a result of the factor analysis, factors affecting the landscape image of Lotte World Tower were symbolic, aesthetic, and formative. Identification, which is a morphological feature, has characteristics of scale and visibility but it is not statistically significant in preference. Rather, the psychological factors such as the symbolism with characteristics such as poison and specialty, harmony with the characteristics of the surrounding environment, and beautiful aesthetic characteristics were an influence on the landscape image. The common results of the two research methods show that psychological characteristics such as factors that can represent and represent the city affect the landscape image more greatly than the morphological and physical characteristics such as location and location of the building. In addition, the text mining technique can identify nouns and adjectives corresponding to the images that people see and feel, and confirms the relationship between the derived keywords, so that it can focus the process of forming the landscape image and further the image of the city. It would appear to be a suitable method to complement the limitation of landscape research. This study is meaningful in that it confirms the possibility that big data can be utilized in landscape analysis, which is one research field of landscape architecture, and is significant for understanding the information of a big data base and contribute to enlarging the landscape research area.

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model (문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구)

  • Shim, Jae-Seung;Won, Ha-Ram;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.201-220
    • /
    • 2019
  • Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.