• 제목/요약/키워드: Text data

검색결과 2,953건 처리시간 0.029초

A Study on the Current Situation and Trend Analysis of The Elderly Healthcare Applications Using Big Data Analysis (텍스트마이닝을 활용한 노인 헬스케어 앱 사용 추이 및 동향 분석)

  • Byun, Hyun;Jeon, Sang-Wan;YI, Eun-Surk
    • Journal of the Korea Convergence Society
    • /
    • 제13권5호
    • /
    • pp.313-325
    • /
    • 2022
  • The purpose of this study is to examine the changes in the elderly healthcare app market through text mining analysis and to present basic data for activating elderly healthcare apps. Data collection was conducted on Naver, Daum, blog web, and cafe. As for the research method, text mining, TF-IDF(Term frequency-inverse document frequency), emotional analysis, and semantic network analysis were conducted using Textom and Ucinet6, which are big data analysis programs. As a result of this study, a total of six categories were finally derived: resolving the healthcare app information gap, convergence healthcare technology, diffusion media, elderly healthcare app industry, social background, and content. In conclusion, in order for elderly healthcare apps to be accepted and utilized by the elderly, they must have a good diffusion infrastructure, and the effectiveness of healthcare apps must be maximized through the active introduction of convergence technology and content development that can be easily used by the elderly.

Study on the Analysis of National Paralympics by Utilizing Social Big Data Text Mining (소셜 빅데이터 텍스트 마이닝을 활용한 전국장애인체육대회 분석 연구)

  • Kim, Dae kyung;Lee, Hyun Su
    • 한국체육학회지인문사회과학편
    • /
    • 제55권6호
    • /
    • pp.801-810
    • /
    • 2016
  • The purpose of the study was to conduct a text mining examining keywords related to the National Paralympics and provide the fundamental information that would be used to change perception of people without disabilities toward disabilities and to promote the social participation of people with and without disabilities in the National Paralympics. Social big data regarding the National Paralympics were retrieved from news articles and blog postings identified by search engines, Naver, Daum, and Google. The data were then analysed using R-3.3.1 Version Program. The analysing techniques were cloud analysis, correlation analysis and social network analysis. The results were as follows. First, news were mainly related to game results, sports events, team participation and host avenue of the 33rd ~ 36th National Paralympics. Second, search results about the 33rd ~ 36th National Paralympics between Naver, Daum, and Google were similar to one another. Thirds, the keywrods, National Paralympics, sports for the disabled, and sports, demonstrated a high close centrality. Further, degree centrality and betweenness centrality were associated in the keywords such as sports for all, participation, research, development, sports-disabled, research-disabled, sports for all-participation, disabled-participation, sports for all-disabled, and host-paralympics.

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • 제65권7호
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

Investigation of Topic Trends in Computer and Information Science by Text Mining Techniques: From the Perspective of Conferences in DBLP (텍스트 마이닝 기법을 이용한 컴퓨터공학 및 정보학 분야 연구동향 조사: DBLP의 학술회의 데이터를 중심으로)

  • Kim, Su Yeon;Song, Sung Jeon;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • 제32권1호
    • /
    • pp.135-152
    • /
    • 2015
  • The goal of this paper is to explore the field of Computer and Information Science with the aid of text mining techniques by mining Computer and Information Science related conference data available in DBLP (Digital Bibliography & Library Project). Although studies based on bibliometric analysis are most prevalent in investigating dynamics of a research field, we attempt to understand dynamics of the field by utilizing Latent Dirichlet Allocation (LDA)-based multinomial topic modeling. For this study, we collect 236,170 documents from 353 conferences related to Computer and Information Science in DBLP. We aim to include conferences in the field of Computer and Information Science as broad as possible. We analyze topic modeling results along with datasets collected over the period of 2000 to 2011 including top authors per topic and top conferences per topic. We identify the following four different patterns in topic trends in the field of computer and information science during this period: growing (network related topics), shrinking (AI and data mining related topics), continuing (web, text mining information retrieval and database related topics), and fluctuating pattern (HCI, information system and multimedia system related topics).

A Study on the Potential and Limitation of Pre-producing Dramas through Social Analysis -focusing on a jtbc drama - (소셜 분석을 통한 사전제작 드라마의 가능성과 한계에 관한 연구 -jtbc <맨투맨>을 중심으로-)

  • Kim, Kyung-Ae;Ku, Jin-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제19권2호
    • /
    • pp.164-172
    • /
    • 2018
  • This paper examines the relevance of pre-production and storytelling in big data analysis and, focusing on JTBC's Man to Man series, looks at how the drama's storytelling should be structured. In this study, we conducted text mining on blogs focused on a particular topic to read the viewer's thoughts on pre-produced dramas and on 67 blogs written about Pre-Production Dramas from 2016.12.15 to 2017.12.15. Also, we conducted sentiment analysis about the Man to Man series, which is not only a pre-production drama, but also has storytelling issues. The blog text extraction and text mining were analyzed using the OutWit Hub and the R, and the tools.provided by social metrics were used to make sentiment analyses of the larger data. Sentiment analysis revealed that the viewers of the Man to Man series did not agree with the romance between Kim Sul-woo and Cha Do-ha, due to the lack of reality in the female characters. Therefore, it was concluded that it is crucial to increase the reality of the characters in order to increase the audience's empathy. These studies will continue to be necessary, because they will form the basis for digitally driven storytelling studies and will provide valuable materials for conducting predictions and instructions in the cultural content industry.

A Mobile Newspaper Application Interface to Enhance Information Accessibility of the Visually Impaired (시각장애인의 정보 접근성 향상을 위한 모바일 신문 어플리케이션 인터페이스)

  • Lee, Seung Hwan;Hong, Seong Ho;Ko, Seung Hee;Choi, Hee Yeon;Hwang, Sung Soo
    • Journal of the HCI Society of Korea
    • /
    • 제11권3호
    • /
    • pp.5-12
    • /
    • 2016
  • The number of visually-impaired people using a smartphone is currently increasing with the help Text-to-Speech(TTS). TTS converts text data in a mobile application into sound data, and it only allows sequential search. For this reason, the location of buttons and contents inside an application should be determined carefully. However, little attention has been made on TTS service environment during the development of mobile newspaper application. This makes visually-impaired people difficult to use these applications. Furthermore, a mobile application interface which also reflects the desire of the low vision is necessary. Therefore, this paper presents a mobile newspaper interface which considers the accessibility and the desire of various visually impaired people. To this end, the proposed interface locates buttons with the consideration of TTS service environment and provides search functionality. The proposed interface also enables visually impaired people to use the application smoothly by filtering out the words that are pronounced improperly and providing the proper explanation for every button. Finally, several functionalities such as increasing font size and color reversal are implemented for the low vision. Simulation results show that the proposed interface achieves better performance than other applications in terms of search speed and usability.

A Trend Analysis of Agricultural and Food Marketing Studies Using Text-mining Technique (텍스트마이닝 기법을 이용한 국내 농식품유통 연구동향 분석)

  • Yoo, Li-Na;Hwang, Su-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제18권10호
    • /
    • pp.215-226
    • /
    • 2017
  • This study analyzed trends in agricultural and food marketing studies from 1984 to 2015 using text-mining techniques. Text-mining is a part of Big-data analysis, which is an effective tool to objectively process large amounts of information based on categorization and trend analysis. In the present study, frequency analysis, topic analysis and association rules were conducted. Titles of agricultural and food marketing studies in four journals and reports were used for placing the analysis. The results showed that 1,126 total theses related to agricultural and food marketing could be categorized into six subjects. There were significant changes in research trends before and after the 2000s. While research before 2000s focused on farm and wholesale level marketing, research after the 2000s mainly covered consumption, (processed)food, exports and imports. Local food and school meals are new subjects that are increasingly being studied. Issues regarding agricultural supply and demand were the only subjects investigated in policy research studies. Interest in agricultural supply and demand was lost after the 2000s. A number of studies after the 2010s analyzed consumption, primarily consumption trends and consumer behavior.

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of the Korea Society of Computer and Information
    • /
    • 제25권8호
    • /
    • pp.181-188
    • /
    • 2020
  • In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.

Classification Modeling for Predicting Medical Subjects using Patients' Subjective Symptom Text (환자의 주관적 증상 텍스트에 대한 진료과목 분류 모델 구축)

  • Lee, Seohee;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • 제6권1호
    • /
    • pp.51-62
    • /
    • 2021
  • In the field of medical artificial intelligence, there have been a lot of researches on disease prediction and classification algorithms that can help doctors judge, but relatively less interested in artificial intelligence that can help medical consumers acquire and judge information. The fact that more than 150,000 questions have been asked about which hospital to go over the past year in NAVER portal will be a testament to the need to provide medical information suitable for medical consumers. Therefore, in this study, we wanted to establish a classification model that classifies 8 medical subjects for symptom text directly described by patients which was collected from NAVER portal to help consumers choose appropriate medical subjects for their symptoms. In order to ensure the validity of the data involving patients' subject matter, we conducted similarity measurements between objective symptom text (typical symptoms by medical subjects organized by the Seoul Emergency Medical Information Center) and subjective symptoms (NAVER data). Similarity measurements demonstrated that if the two texts were symptoms of the same medical subject, they had relatively higher similarity than symptomatic texts from different medical subjects. Following the above procedure, the classification model was constructed using a ridge regression model for subjective symptom text that obtained validity, resulting in an accuracy of 0.73.

Occupational Therapy in Long-Term Care Insurance For the Elderly Using Text Mining (텍스트 마이닝을 활용한 노인장기요양보험에서의 작업치료: 2007-2018년)

  • Cho, Min Seok;Baek, Soon Hyung;Park, Eom-Ji;Park, Soo Hee
    • Journal of Society of Occupational Therapy for the Aged and Dementia
    • /
    • 제12권2호
    • /
    • pp.67-74
    • /
    • 2018
  • Objective : The purpose of this study is to quantitatively analyze the role of occupational therapy in long - term care insurance for the elderly using text mining, one of the big data analysis techniques. Method : For the analysis of newspaper articles, "Long - Term Care Insurance for the Elderly + Occupational Therapy for the Elderly" was collected after the period from 2007 to 208. Naver, which has a high share of the domestic search engine, utilized the database of Naver News by utilizing Textom, a web crawling tool. After collecting the article title and original text of 510 news data from the collection of the elderly long term care insurance + occupational therapy search, we analyzed the article frequency and key words by year. Result : In terms of the frequency of articles published by year, the number of articles published in 2015 and 2017 was the highest with 70 articles (13.7%), and the top 10 terms of the key word analysis showed the highest frequency of 'dementia' (344) In terms of key words, dementia, treatment, hospital, health, service, rehabilitation, facilities, institution, grade, elderly, professional, salary, industrial complex and people are related. Conclusion : In this study, it is meaningful that the textual mining technique was used to more objectively confirm the social needs and the role of the occupational therapist for the dementia and rehabilitation in the related key keywords based on the media reporting trend of the elderly long - term care insurance for 11 years. Based on the results of this study, future research should expand research field and period and supplement the research methodology through various analysis methods according to the year.