• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.027 seconds

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.

Design of a Mirror for Fragrance Recommendation based on Personal Emotion Analysis (개인의 감성 분석 기반 향 추천 미러 설계)

  • Hyeonji Kim;Yoosoo Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.11-19
    • /
    • 2023
  • The paper proposes a smart mirror system that recommends fragrances based on user emotion analysis. This paper combines natural language processing techniques such as embedding techniques (CounterVectorizer and TF-IDF) and machine learning classification models (DecisionTree, SVM, RandomForest, SGD Classifier) to build a model and compares the results. After the comparison, the paper constructs a personal emotion-based fragrance recommendation mirror model based on the SVM and word embedding pipeline-based emotion classifier model with the highest performance. The proposed system implements a personalized fragrance recommendation mirror based on emotion analysis, providing web services using the Flask web framework. This paper uses the Google Speech Cloud API to recognize users' voices and use speech-to-text (STT) to convert voice-transcribed text data. The proposed system provides users with information about weather, humidity, location, quotes, time, and schedule management.

A Study on the multiplex transmission method according to the data size using Zigbee and Wifi in Wireless Sensor Networks (무선 센서 네트워크에서의 Zigbee 및 Wifi를 이용한 데이터 크기에 따른 다중 전송 방법에 관한 연구)

  • Shin, Dong-ryoul;Kim, Myeon-sik;Oh, Young-jun;Lee, Kang-whan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.97-100
    • /
    • 2015
  • In this paper, we propose an efficient method of transmitting data according to size using the Zigbee and Wifi in a wireless sensor network. In existing wireless sensor network using Zigbee and Wifi, Zigbee transmits text and Wifi transmits image and video. Existing research methods in this way is caused a problem that the transmission by selecting a transmission method according to data type. In the case of aggregated data, Zigbee is less efficient than Wifi in some case because of limitation of zigbee's transmission rate, even if data type is text. In this paper, we propose a method for selecting a transmission method by considering the size of the data and transfer efficiency not by data type. It was obtained results that appear to be convenient, according to the linkage algorithm structure of the image and sensing data by selecting a transmission method from data size, not the type of the given data.

  • PDF

Frequency and Social Network Analysis of the Bible Data using Big Data Analytics Tools R (빅데이터 분석도구 R을 이용한 성경 데이터의 빈도와 소셜 네트워크 분석)

  • Ban, ChaeHoon;Ha, JongSoo;Kim, Dong Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.166-171
    • /
    • 2020
  • Big data processing technology that can store and analyze data and obtain new knowledge has been adjusted for importance in many fields of the society. Big data is emerging as an important problem in the field of information and communication technology, but the mind of continuous technology is rising. the R, a tool that can analyze big data, is a language and environment that enables information analysis of statistical bases. In this paper, we use this to analyze the Bible data. We analyze the four Gospels of the New Testament in the Bible. We collect the Bible data and perform filtering for analysis. The R is used to investigate the frequency of what text is distributed and analyze the Bible through social network analysis, in which words from a sentence are paired and analyzed between words for accurate data analysis.

Design and Implementation of Event-driven Real-time Web Crawler to Maintain Reliability (신뢰성 유지를 위한 이벤트 기반 실시간 웹크롤러의 설계 및 구현)

  • Ahn, Yong-Hak
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.1-6
    • /
    • 2022
  • Real-time systems using web cralwing data must provide users with data from the same database as remote data. To do this, the web crawler repeatedly sends HTTP(HtypeText Transfer Protocol) requests to the remote server to see if the remote data has changed. This process causes network load on the crawling server and remote server, causing problems such as excessive traffic generation. To solve this problem, in this paper, based on user events, we propose a real-time web crawling technique that can reduce the overload of the network while securing the reliability of maintaining the sameness between the data of the crawling server and data from multiple remote locations. The proposed method performs a crawling process based on an event that requests unit data and list data. The results show that the proposed method can reduce the overhead of network traffic in existing web crawlers and secure data reliability. In the future, research on the convergence of event-based crawling and time-based crawling is required.

Analysis Method of User Review using Open Data (오픈 데이터를 이용한 사용자 리뷰 분석 방법)

  • Choi, Taeho;Hwang, Mansoo;Kim, Neunghoe
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.6
    • /
    • pp.185-190
    • /
    • 2022
  • Open data has a lot of economic value. Not only Korea, but many other countries are doing their best to make various policies and efforts to expand and utilize open data. However, although Korea has a large amount of data, the data is not utilized effectively. Thus, attempts to utilize those data should be made in various industries. In particular, in the fashion industry, exchange and refund problems are the most common due to unpredictable consumers. Better feedback is necessary for service providers to solve this problem. We want to solve it by showing improved images of dissatisfactions along with user reviews including consumer needs. In this paper, user reviews are analyzed on online shopping mall websites to identify consumer needs, and product attributes are defined by utilizing the attributes of K-fashion data. The users' request is defined as a dissatisfaction attribute, and labeling data with the corresponding attribute is searched. The users' request is provided to the service provider in forms of text data or attributes, as well as an image to help improve the product.

Web crawler Improvement and Dynamic process Design and Implementation for Effective Data Collection (효과적인 데이터 수집을 위한 웹 크롤러 개선 및 동적 프로세스 설계 및 구현)

  • Wang, Tae-su;Song, JaeBaek;Son, Dayeon;Kim, Minyoung;Choi, Donggyu;Jang, Jongwook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1729-1740
    • /
    • 2022
  • Recently, a lot of data has been generated according to the diversity and utilization of information, and the importance of big data analysis to collect, store, process and predict data has increased, and the ability to collect only necessary information is required. More than half of the web space consists of text, and a lot of data is generated through the organic interaction of users. There is a crawling technique as a representative method for collecting text data, but many crawlers are being developed that do not consider web servers or administrators because they focus on methods that can obtain data. In this paper, we design and implement an improved dynamic web crawler that can efficiently fetch data by examining problems that may occur during the crawling process and precautions to be considered. The crawler, which improved the problems of the existing crawler, was designed as a multi-process, and the work time was reduced by 4 times on average.

A study on the current status of DIY clothing products related to fabric using text mining (텍스트마이닝을 활용한 패브릭 관련 DIY 의류 상품 현황 연구)

  • Eun-Hye Lee;Ha-Eun Lee;Jeong-Wook Choi
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.25 no.2
    • /
    • pp.111-122
    • /
    • 2023
  • This study aims to collect Big Data related to DIY clothing, analyze the results on a year-by-year basis, understand consumers' perceptions, the status, and reality of DIY clothing. The reference period for the evaluation of DIY clothing trends was set from 2012 to 2022. The data in this study was collected and analyzed using Textom, a Big Data solution program certified as a Good Software by the Telecommunications Technology Association (TTA). For the analysis of fabric-related DIY products, the keyword was set to "DIY clothing", and for data cleansing following collection, the "Espresso K" module was employed. Also, via data collection on a year-by-year basis, a total of 11 lists were generated and the collected data was analyzed by period. The following are the findings of this study's data collection on DIY clothing. The total number of keywords collected over a period of ten years on search engines "Naver" and "Google" between January 1, 2012 and December 31, 2022 was 16,315, and data trends by period indicate a continuous upward trend. In addition, a keyword analysis was conducted to analyze TF-IDF (Term Frequency-Inverse Document Frequency), a statistical measure that reflects the importance of a word within data, and the relationship with N-gram, an analysis of the correlation concerning the relationship between words. Using these results, it was possible to evaluate the popularity and growing tendency of DIY clothing products in conjunction with the evolving social environment, as well as the desire to explore DIY trends among consumers. Therefore, this study is valuable in that it provides preliminary data for DIY clothing research by analyzing the status and reality of DIY products, and furthermore, contributes to the development and production of DIY clothing.

A Study of the Consumer Major Perception of Packaging Using Big Data Analysis -Focusing on Text Mining and Semantic Network Analysis- (빅데이터 분석을 통한 패키징에 대한 소비자의 주요 인식 조사 -텍스트 마이닝과 의미연결망 분석을 중심으로-)

  • Kang, Wook-Geon;Ko, Eui-Suk;Lee, Hak-Rae;Kim, Jai-neung
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.4
    • /
    • pp.15-22
    • /
    • 2018
  • The purpose of this study is to investigate the consumer perception of packaging using big data analysis. This study use text mining to extract meaningful words from text and semantic network analysis to analyze connectivity and propagation trends. Data were collected by dividing the 'packaging(Korean)' and 'packaging(English)'. This study visualized the word network structure of the two key words and classified them into four groups with similar meaning through CONCOR analysis. The group name was specified based on the words constituting the classified group. These groups are a major category of consumers' perception of packaging. Especially cosmetics and design have high frequency of words and high centrality. Therefore it can be expected that the packaging design is perceived as important in the cosmetics industry. This study predicts consumers' perception of packaging so it can be a basis for future research and industry development.

Online Document Mining Approach to Predicting Crowdfunding Success (온라인 문서 마이닝 접근법을 활용한 크라우드펀딩의 성공여부 예측 방법)

  • Nam, Suhyeon;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.45-66
    • /
    • 2018
  • Crowdfunding has become more popular than angel funding for fundraising by venture companies. Identification of success factors may be useful for fundraisers and investors to make decisions related to crowdfunding projects and predict a priori whether they will be successful or not. Recent studies have suggested several numeric factors, such as project goals and the number of associated SNS, studying how these affect the success of crowdfunding campaigns. However, prediction of the success of crowdfunding campaigns via non-numeric and unstructured data is not yet possible, especially through analysis of structural characteristics of documents introducing projects in need of funding. Analysis of these documents is promising because they are open and inexpensive to obtain. We propose a novel method to predict the success of a crowdfunding project based on the introductory text. To test the performance of the proposed method, in our study, texts related to 1,980 actual crowdfunding projects were collected and empirically analyzed. From the text data set, the following details about the projects were collected: category, number of replies, funding goal, fundraising method, reward, number of SNS followers, number of images and videos, and miscellaneous numeric data. These factors were identified as significant input features to be used in classification algorithms. The results suggest that the proposed method outperforms other recently proposed, non-text-based methods in terms of accuracy, F-score, and elapsed time.