• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.025 seconds

Developing the Automated Sentiment Learning Algorithm to Build the Korean Sentiment Lexicon for Finance (재무분야 감성사전 구축을 위한 자동화된 감성학습 알고리즘 개발)

  • Su-Ji Cho;Ki-Kwang Lee;Cheol-Won Yang
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.32-41
    • /
    • 2023
  • Recently, many studies are being conducted to extract emotion from text and verify its information power in the field of finance, along with the recent development of big data analysis technology. A number of prior studies use pre-defined sentiment dictionaries or machine learning methods to extract sentiment from the financial documents. However, both methods have the disadvantage of being labor-intensive and subjective because it requires a manual sentiment learning process. In this study, we developed a financial sentiment dictionary that automatically extracts sentiment from the body text of analyst reports by using modified Bayes rule and verified the performance of the model through a binary classification model which predicts actual stock price movements. As a result of the prediction, it was found that the proposed financial dictionary from this research has about 4% better predictive power for actual stock price movements than the representative Loughran and McDonald's (2011) financial dictionary. The sentiment extraction method proposed in this study enables efficient and objective judgment because it automatically learns the sentiment of words using both the change in target price and the cumulative abnormal returns. In addition, the dictionary can be easily updated by re-calculating conditional probabilities. The results of this study are expected to be readily expandable and applicable not only to analyst reports, but also to financial field texts such as performance reports, IR reports, press articles, and social media.

Text mining analysis of terms and information on product names used in online sales of women's clothing (텍스트마이닝을 활용한 온라인 판매 여성 의류 상품명에 나타난 용어 및 정보분석)

  • Yeo Sun Kang
    • The Research Journal of the Costume Culture
    • /
    • v.31 no.1
    • /
    • pp.34-52
    • /
    • 2023
  • In this study, text mining was conducted on the product names of skirts, pants, shirts/blouses, and dresses to analyze the characteristics of keywords appearing in online shopping product names. As a result of frequency analysis, the number of keywords that appeared 0.5% or more for each item was around 30, and the number of keywords that appeared 0.1% or more was around 150. The cumulative distribution rate of 150 terms was around 80%. Accordingly, information on 150 key terms was analyzed, from which item, clothing composition, and material information were the found to be the most important types of information (ranking in the top five of all items). In addition, fit and style information for skirts and pants and length information for skirts and dresses were also considered important information. Keywords representing clothing composition information were: banding, high waist, and split for skirts and pants; and V-neck, tie, long sleeves, and puff for shirts/blouses and dresses. It was possible to identify the current design characteristics preferred by consumers from this information. However, there were also problems with terminology that hindered the connection between sellers and consumers. The most common problems were the use of various terms with the same meaning and irregular use of Korean and English terms. However, as a result of using co-appearance frequency analysis, it can be interpreted that there is little intention for product exposure, so it is recommended to avoid it.

A Study on the Network Text Analysis about Oral Health in Aging-Well

  • Seol-Hee Kim
    • Journal of dental hygiene science
    • /
    • v.23 no.4
    • /
    • pp.302-311
    • /
    • 2023
  • Background: Oral health is an important element of well aging. And oral health also affects overall health, mental health, and quality of life. In this study, we sought to identify oral health influencing factors and research trends for well-aging through text analysis of research on well-aging and oral health over the past 12 years. Methods: The research data was analyzed based on English literature published in PubMed from 2012 to 2023. Aging well and oral health were used as search terms, and 115 final papers were selected. Network text analysis included keyword frequency analysis, centrality analysis, and cohesion structure analysis using the Net-Miner 4.0 program. Results: Excluding general characteristics, the most frequent keywords in 115 articles, 520 keywords (Mesh terms) were psychology, dental prosthesis and Alzheimer's disease, Dental caries, cognition, cognitive dysfunction, and bacteria. Research keywords with high degree centrality were Dental caries (0.864), Quality of life (0.833), Tooth loss (0.818), Health status (0.727), and Life expectancy (0.712). As a result of community analysis, it consisted of 4 groups. Group 1 consisted of chewing and nutrition, Group 2 consisted oral diseases, systemic diseases and management, Group 3 consisted oral health and mental health, Group 4 consisted oral frailty symptoms and quality of life. Conclusion: In an aging society, oral dysfunction affects mental health and quality of life. Preventing oral diseases for well-aging can have a positive impact on mental health and quality of life. Therefore, efforts are needed to prevent oral frailty in a super-aging society by developing and educating systematic oral care programs for each life cycle.

Analyzing Issues on Environment-Friendly Agriculture Using Topic Modeling and Network Analysis (토픽모델링과 네트워크분석을 활용한 친환경농업 이슈분석에 관한 연구)

  • Shin, Ye-Eun;Shin, Eun-Seo;Kim, Sang-Bum;Choi, Jin-Ah;Kim, Myunghyun;Han, Seokjun;An, Kyungjin
    • Journal of Korean Society of Rural Planning
    • /
    • v.29 no.4
    • /
    • pp.35-53
    • /
    • 2023
  • This study attempts to identify the flow of key topics and issues of research trends related to environment-friendly agriculture conducted around the 2000s in South Korea and compare them with the environment-friendly agriculture promotion plan to seek the level of consistency and the direction of future development of environment-friendly agriculture. For the analysis of environment-friendly agriculture research trends and policy consistency, 'topic modeling', which is suitable for subject classification of large amounts of unstructured data, and 'text network analysis', which visualizes the relationship between keywords as a network and interprets its characteristics, were utilized. Overall, active discussions were held on 'technical discussions for the production and cultivation of environment-friendly agricultural products' and 'food safety & consumer awareness', and keywords such as production, cultivation, consumption, and safety were consistently linked to other keywords regardless of time. In addition, it was found that the issue of environment-friendly agriculture was partially consistent with the policy direction of the period. Considering the fact that the ongoing '5th Environment-Friendly Agriculture Promotion Phase' emphasizes the strengthening of rural environment management and aims to ensure the continuous quantitative and qualitative development of environment-friendly agriculture, active discussions and research on its environmental contributions and management methods are needed.

Research on Developing a Conversational AI Callbot Solution for Medical Counselling

  • Won Ro LEE;Jeong Hyon CHOI;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.4
    • /
    • pp.9-13
    • /
    • 2023
  • In this study, we explored the potential of integrating interactive AI callbot technology into the medical consultation domain as part of a broader service development initiative. Aimed at enhancing patient satisfaction, the AI callbot was designed to efficiently address queries from hospitals' primary users, especially the elderly and those using phone services. By incorporating an AI-driven callbot into the hospital's customer service center, routine tasks such as appointment modifications and cancellations were efficiently managed by the AI Callbot Agent. On the other hand, tasks requiring more detailed attention or specialization were addressed by Human Agents, ensuring a balanced and collaborative approach. The deep learning model for voice recognition for this study was based on the Transformer model and fine-tuned to fit the medical field using a pre-trained model. Existing recording files were converted into learning data to perform SSL(self-supervised learning) Model was implemented. The ANN (Artificial neural network) neural network model was used to analyze voice signals and interpret them as text, and after actual application, the intent was enriched through reinforcement learning to continuously improve accuracy. In the case of TTS(Text To Speech), the Transformer model was applied to Text Analysis, Acoustic model, and Vocoder, and Google's Natural Language API was applied to recognize intent. As the research progresses, there are challenges to solve, such as interconnection issues between various EMR providers, problems with doctor's time slots, problems with two or more hospital appointments, and problems with patient use. However, there are specialized problems that are easy to make reservations. Implementation of the callbot service in hospitals appears to be applicable immediately.

Optimizing Input Parameters of Paralichthys olivaceus Disease Classification based on SHAP Analysis (SHAP 분석 기반의 넙치 질병 분류 입력 파라미터 최적화)

  • Kyung-Won Cho;Ran Baik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1331-1336
    • /
    • 2023
  • In text-based fish disease classification using machine learning, there is a problem that the input parameters of the machine learning model are too many, but due to performance problems, the input parameters cannot be arbitrarily reduced. This paper proposes a method of optimizing input parameters specialized for Paralichthys olivaceus disease classification using SHAP analysis techniques to solve this problem,. The proposed method includes data preprocessing of disease information extracted from the halibut disease questionnaire by applying the SHAP analysis technique and evaluating a machine learning model using AutoML. Through this, the performance of the input parameters of AutoML is evaluated and the optimal input parameter combination is derived. In this study, the proposed method is expected to be able to maintain the existing performance while reducing the number of input parameters required, which will contribute to enhancing the efficiency and practicality of text-based Paralichthys olivaceus disease classification.

Analysis of Research Trends on Archival Information Services Using Text Mining (텍스트마이닝을 활용한 국내외 기록서비스 연구동향 분석)

  • Seohee Park;Hye-Eun Lee
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.24 no.1
    • /
    • pp.89-109
    • /
    • 2024
  • The study analyzed the research trends of domestic and international record information services from 2003 to 2022. A total of 136 academic papers registered in the Korea Citation Index (KCI) and 74 from the Library, Information Science & Technology Abstracts (LISTA) were examined by quantitative and qualitative content analysis to understand the research status of 20 years from various angles, such as publication year, research type, researcher type, subject, and purpose. Frequency analysis, co-occurrence frequency analysis, centrality analysis, and topic modeling were performed by applying text mining techniques. Results showed that domestic papers demonstrated a research flow focused on specific institutions or records, and user-centered satisfaction surveys and content-centered studies were conducted. Moreover, foreign papers confirmed various evaluation-oriented and information provision studies, such as data, resources, and collections, along with the research trend focusing on the relationship between archivists and users. The management of information resources was identified as a common topic in both domestic and foreign papers, but it is possible to identify that domestic research focuses on maintaining the quality of domestic information resources, while foreign research focuses on the storage and retrieval of information.

A study on research trends for pregnancy in adolescence: Focusing on text network analysis and topic modeling (청소년 임신에 대한 연구 동향 분석: 텍스트 네트워크 분석과 토픽 모델링)

  • Park, Seungmi;Kwak, Eunju;Park, Hye Ok;Hong, Jung Eun
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.30 no.2
    • /
    • pp.149-159
    • /
    • 2024
  • Purpose: The aim of this study was to identify core keywords and topic groups in the "adolescent pregnancy" field of research for a better understanding of research trends in the past 10 years. Methods: Topics related to adolescent pregnancy were extracted from 3,819 articles that were published in journals between January 2013 and July 2023. Abstracts were retrieved from five databases (MEDLINE, CINAHL, Embase, RISS, and KISS). Keywords were extracted from the abstracts and cleaned using semantic morphemes. Text network analysis and topic modeling were performed using NetMiner 4.3.3. Results: The most important keywords were "health," "woman," "risk," "group," "girl," "school," "service," "family," "program," and "contraception." Five topic groups were identified through topic modeling. Through the topic modeling analysis, five themes were derived: "health service," "community program for school girls," "risks for adult women," "relationship risks," and "sexual contraceptive knowledge." Conclusion: This study utilized text network analysis and topic modeling to analyze keywords from abstracts of research conducted over the past decade on adolescent pregnancy. Given that adolescent pregnancy leads to physical, mental, social, and economic issues, it is imperative to provide integrated intervention programs, including prenatal/postnatal care, psychological services, proper contraception methods, and sex education, through school and community partnerships, as well as related research studies. Nurses can play a vital role by actively engaging in prevention efforts and directly supporting and educating socially disadvantaged adolescent mothers, which could significantly contribute to improving their quality of life.

A Study on the Design and Implementation of an AI Mock Interview System for Computer Science Interview Preparation Using LLM-based ChatGPT (LLM 기반 ChatGPT를 활용한 컴퓨터 분야 면접 준비용 AI 모의 면접 시스템의 설계 및 구현에 대한 연구)

  • Jae-Sung Chun;Hee-Kwon Jang;Ji-Hye Kim;Chang-Min Bae;Dong-Gyu Lee;Il-Young Moon
    • Journal of Practical Engineering Education
    • /
    • v.16 no.5_spc
    • /
    • pp.643-651
    • /
    • 2024
  • This study aims to design and implement an AI mock interview system for Computer Science (CS) interview preparation using LLM (Large Language Model) based ChatGPT. The system utilizes AI's natural language processing and speech recognition capabilities to analyze and provide real-time feedback on interview responses, helping users improve their weaknesses during the preparation process. According to a survey, 90% of users reported that the real-time feedback function provided substantial assistance in their interview preparation. Key features include GPT prompt generation and Speech-to-Text functionality, which converts voice data into text. The system received positive evaluations for its response time and feedback accuracy. Future research will explore expanding the range of question types and applying the system to various industries.

A Study on the Semantic Network Analysis of "Cooking Academy" through the Big Data (빅데이터를 활용한 "조리학원"의 의미연결망 분석에 관한 연구)

  • Lee, Seung-Hoo;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.24 no.3
    • /
    • pp.167-176
    • /
    • 2018
  • In this study, Big Data was used to collect the information related to 'Cooking Academy' keywords. After collecting all the data, we calculated the frequency through the text mining and selected the main words for future data analysis. Data collection was conducted from Google Web and News during the period from January 1, 2013 to December 31, 2017. The selected 64 words were analyzed by using UCINET 6.0 program, and the analysis results were visualized with NetDraw in order to present the relationship of main words. As a result, it was found that the most important goal for the students from cooking school is to work as a cook, likewise to have practical classes. In addition, we obtained the result that SNS marketing system that the social sites, such as Facebook, Twitter, and Instagram are actively utilized as a marketing strategy of the institute. Therefore, the results can be helpful in searching for the method of utilizing big data and can bring brand-new ideas for the follow-up studies. In practical terms, it will be remarkable material about the future marketing directions and various programs that are improved by the detailed curriculums through semantic network of cooking school by using big data.