• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.031 seconds

Big Data Analysis Using on Based Social Network Service Data (소셜네트워크서비스 기반 데이터를 이용한 빅데이터 분석)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.165-166
    • /
    • 2019
  • Big data analysis is the ability to collect, store, manage and analyze data from existing database management tools. Big data refers to large scale data that is generated in a digital environment, is large in size, has a short generation cycle, and includes not only numeric data but also text and image data. Big data is data that is difficult to manage and analyze in the conventional way. It has huge size, various types, fast generation and velocity. Therefore, companies in most industries are making efforts to create value through the application of Big data. In this study, we analyzed the meaning of keyword using Social Matrix, a big data analysis tool of Daum communications. Also, the theoretical implications are presented based on the analysis results.

  • PDF

Test Dataset for validating the meaning of Table Machine Reading Language Model (표 기계독해 언어 모형의 의미 검증을 위한 테스트 데이터셋)

  • YU, Jae-Min;Cho, Sanghyun;Kwon, Hyuk-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.164-167
    • /
    • 2022
  • In table Machine comprehension, the knowledge required for language models or the structural form of tables changes depending on the domain, showing a greater performance degradation compared to text data. In this paper, we propose a pre-learning data construction method and an adversarial learning method through meaningful tabular data selection for constructing a pre-learning table language model robust to these domain changes in table machine reading. In order to detect tabular data sed for decoration of web documents without structural information from the extracted table data, a rule through heuristic was defined to identify head data and select table data was applied. An adversarial learning method between tabular data and infobax data with knowledge information about entities was applied. When the data was refined compared to when it was trained with the existing unrefined data, F1 3.45 and EM 4.14 increased in the KorQuAD table data, and F1 19.38, EM 4.22 compared to when the data was not refined in the Spec table QA data showed increased performance.

  • PDF

A CF-based Health Functional Recommender System using Extended User Similarity Measure (확장된 사용자 유사도를 이용한 CF-기반 건강기능식품 추천 시스템)

  • Sein Hong;Euiju Jeong;Jaekyeong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-17
    • /
    • 2023
  • With the recent rapid development of ICT(Information and Communication Technology) and the popularization of digital devices, the size of the online market continues to grow. As a result, we live in a flood of information. Thus, customers are facing information overload problems that require a lot of time and money to select products. Therefore, a personalized recommender system has become an essential methodology to address such issues. Collaborative Filtering(CF) is the most widely used recommender system. Traditional recommender systems mainly utilize quantitative data such as rating values, resulting in poor recommendation accuracy. Quantitative data cannot fully reflect the user's preference. To solve such a problem, studies that reflect qualitative data, such as review contents, are being actively conducted these days. To quantify user review contents, text mining was used in this study. The general CF consists of the following three steps: user-item matrix generation, Top-N neighborhood group search, and Top-K recommendation list generation. In this study, we propose a recommendation algorithm that applies an extended similarity measure, which utilize quantified review contents in addition to user rating values. After calculating review similarity by applying TF-IDF, Word2Vec, and Doc2Vec techniques to review content, extended similarity is created by combining user rating similarity and quantified review contents. To verify this, we used user ratings and review data from the e-commerce site Amazon's "Health and Personal Care". The proposed recommendation model using extended similarity measure showed superior performance to the traditional recommendation model using only user rating value-based similarity measure. In addition, among the various text mining techniques, the similarity obtained using the TF-IDF technique showed the best performance when used in the neighbor group search and recommendation list generation step.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

BEPAT: A platform for building energy assessment in energy smart homes and design optimization

  • Kamel, Ehsan;Memari, Ali M.
    • Advances in Energy Research
    • /
    • v.5 no.4
    • /
    • pp.321-339
    • /
    • 2017
  • Energy simulation tools can provide information on the amount of heat transfer through building envelope components, which are considered the main sources of heat loss in buildings. Therefore, it is important to improve the quality of outputs from energy simulation tools and also the process of obtaining them. In this paper, a new Building Energy Performance Assessment Tool (BEPAT) is introduced, which provides users with granular data related to heat transfer through every single wall, window, door, roof, and floor in a building and automatically saves all the related data in text files. This information can be used to identify the envelope components for thermal improvement through energy retrofit or during the design phase. The generated data can also be adopted in the design of energy smart homes, building design tools, and energy retrofit tools as a supplementary dataset. BEPAT is developed by modifying EnergyPlus source code as the energy simulation engine using C++, which only requires Input Data File (IDF) and weather file to perform the energy simulation and automatically provide detailed output. To validate the BEPAT results, a computer model is developed in Revit for use in BEPAT. Validating BEPAT's output with EnergyPlus "advanced output" shows a difference of less than 2% and thus establishing the capability of this tool to facilitate the provision of detailed output on the quantity of heat transfer through walls, fenestrations, roofs, and floors.

Development of Chatbot Using Q&A Data of SME(Small and Medium Enterprise) (소상공인들의 고객 문의 데이터를 활용한 문의응대 챗봇의 개발 및 도입)

  • Shin, Minchul;Kim, Sungguen;Rhee, Cheul
    • Journal of Information Technology Services
    • /
    • v.17 no.3
    • /
    • pp.17-36
    • /
    • 2018
  • In this study, we developed a chatbot (Dialogue agent) using small Q & A data and evaluated its performance. The chatbot developed in this study was developed in the form of an FAQ chatbot that responds promptly to customer inquiries. The development of chatbot was conducted in three stages : 1. Analysis and planning, 2. Content creation, 3. API and messenger interworking. During the analysis and planning phase, we gathered and analyzed the question data of the customers and extracted the topics and details of the customers' questions. In the content creation stage, we created scenarios for each topic and sub-items, and then filled out specific answers in consultation with business owners. API and messenger interworking is KakaoTalk. The performance of the chatbot was measured by the quantitative indicators such as the accuracy that the chatbot grasped the inquiry of the customer and correctly answered, and then the questionnaire survey was conducted on the chatbot users. As a result of the survey, it was found that the chatbot not only provided useful information to the users but positively influenced the image of the pension. This study shows that it is possible to develop chatbots by using easily obtainable data and commercial API regardless of the size of business. It also implies that we have verified the validity of the development process by verifying the performance of developed chatbots as well as an explicit process of developing FAQ chatbots.

Domain-Adaptation Technique for Semantic Role Labeling with Structural Learning

  • Lim, Soojong;Lee, Changki;Ryu, Pum-Mo;Kim, Hyunki;Park, Sang Kyu;Ra, Dongyul
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.429-438
    • /
    • 2014
  • Semantic role labeling (SRL) is a task in natural-language processing with the aim of detecting predicates in the text, choosing their correct senses, identifying their associated arguments, and predicting the semantic roles of the arguments. Developing a high-performance SRL system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Constructing SRL training data for a new domain is very expensive. Therefore, domain adaptation in SRL can be regarded as an important problem. In this paper, we show that domain adaptation for SRL systems can achieve state-of-the-art performance when based on structural learning and exploiting a prior model approach. We provide experimental results with three different target domains showing that our method is effective even if training data of small size is available for the target domains. According to experimentations, our proposed method outperforms those of other research works by about 2% to 5% in F-score.

Application of Domain Knowledge in Transaction-based Recommender Systems through Word Embedding (트랜잭션 기반 추천 시스템에서 워드 임베딩을 통한 도메인 지식 반영)

  • Choi, Yeoungje;Moon, Hyun Sil;Cho, Yoonho
    • Knowledge Management Research
    • /
    • v.21 no.1
    • /
    • pp.117-136
    • /
    • 2020
  • In the studies for the recommender systems which solve the information overload problem of users, the use of transactional data has been continuously tried. Especially, because the firms can easily obtain transactional data along with the development of IoT technologies, transaction-based recommender systems are recently used in various areas. However, the use of transactional data has limitations that it is hard to reflect domain knowledge and they do not directly show user preferences for individual items. Therefore, in this study, we propose a method applying the word embedding in the transaction-based recommender system to reflect preference differences among users and domain knowledge. Our approach is based on SAR, which shows high performance in the recommender systems, and we improved its components by using FastText, one of the word embedding techniques. Experimental results show that the reflection of domain knowledge and preference difference has a significant effect on the performance of recommender systems. Therefore, we expect our study to contribute to the improvement of the transaction-based recommender systems and to suggest the expansion of data used in the recommender system.

A Study on the Interoperability between the HL7 and the IEEE 1451 based Sensor Network (HL7과 IEEE 1451 기반 센서 네트워크와의 연동에 관한 연구)

  • Kim, Woo-Shik;Lim, Su-Young;Ahn, Jin-Soo;Nah, Ji-Young;Kim, Nam-Hyun
    • Journal of Biomedical Engineering Research
    • /
    • v.29 no.6
    • /
    • pp.457-465
    • /
    • 2008
  • HL7(Health Level 7) is a standard for exchanging medical and healthcare data among different medical information systems. As the ubiquitous era is coming, in addition to text and imaging information, a new type of data, i.e., streaming sensor data appear. Since the HL7 is not covering the interfaces among the devices that produces sensor data, it is expected that sooner or later the HL7 needs to include the biomedical sensors and sensor networks. The IEEE 1451 is a family of standards that deals with the sensors, transducers including sensors and actuators, and various wired or wireless sensor networks. In this paper, we consider the possibility of interoperability between the IEEE 1451 and HL7. After we propose a format of messages in HL7 to include the IEEE 1451 TEDS, we present some preliminary results that show the possibility of integrating the two standards.

Algorithm for Spatial XML Generator (Spatial XML 생성기를 위한 알고리즘)

  • Seo, Hyun-Hho;Choi, Young Un
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.11a
    • /
    • pp.466-469
    • /
    • 2004
  • XMLis that XML developers in client application data text base format that can express and deliver and exchange structured data to display and manufacture. Store Spatial XML to RDBMS actually and is using to useful data that do geography information there is many interests in utilization that Spatial XML including Spatial information does it appearing to this XML Wish to search Spatial XML about data that is stored to RDBMS using Spatial XQuery to ask a question in this paper and embody Spatial XML Generator algorithm that draw information in RDBMS in XML form through Spatial XML Generator also.

  • PDF