• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.033 seconds

Training Techniques for Data Bias Problem on Deep Learning Text Summarization (딥러닝 텍스트 요약 모델의 데이터 편향 문제 해결을 위한 학습 기법)

  • Cho, Jun Hee;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.949-955
    • /
    • 2022
  • Deep learning-based text summarization models are not free from datasets. For example, a summarization model trained with a news summarization dataset is not good at summarizing other types of texts such as internet posts and papers. In this study, we define this phenomenon as Data Bias Problem (DBP) and propose two training methods for solving it. The first is the 'proper nouns masking' that masks proper nouns. The second is the 'length variation' that randomly inflates or deflates the length of text. As a result, experiments show that our methods are efficient for solving DBP. In addition, we analyze the results of the experiments and present future development directions. Our contributions are as follows: (1) We discovered DBP and defined it for the first time. (2) We proposed two efficient training methods and conducted actual experiments. (3) Our methods can be applied to all summarization models and are easy to implement, so highly practical.

Change in Market Issues on HMR (Home Meal Replacements) Using Local Foods after the COVID-19 Outbreak: Text Mining of Online Big Data (코로나19 발생 후 지역농산물 이용 간편식에 대한 시장 이슈 변화: 온라인 빅데이터의 텍스트마이닝)

  • Yoojeong, Joo;Woojin, Byeon;Jihyun, Yoon
    • Journal of the Korean Society of Food Culture
    • /
    • v.38 no.1
    • /
    • pp.1-14
    • /
    • 2023
  • This study was conducted to explore the change in the market issues on HMR (Home Meal Replacements) using local foods after the COVID-19 outbreak. Online text data were collected from internet news, social media posts, and web documents before (from January 2016 to December 2019) and after (from January 2020 to November 2022) the COVID-19 outbreak. TF-IDF analysis showed that 'Trend', 'Market', 'Consumption', and 'Food service industry' were the major keywords before the COVID-19 outbreak, whereas 'Wanju-gun', 'Distribution', 'Development', and 'Meal-kit' were main keywords after the COVID-19 outbreak. The results of topic modeling analysis and categorization showed that after the COVID-19 outbreak, the 'Market' category included 'Non-face-to-face market' instead of 'Event,' and 'Delivery' instead of 'Distribution'. In the 'Product' category, 'Marketing' was included instead of 'Trend'. Additionally, in the 'Support' category, 'Start-up' and 'School food service' appeared as new topics after the COVID-19 outbreak. In conclusion, this study showed that meaningful change had occurred in market issues on HMR using local foods after the COVID-19 outbreak. Therefore, governments should take advantage of such market opportunity by implementing policy and programs to promote the development and marketing of HMR using local foods.

Multi-type object detection-based de-identification technique for personal information protection (개인정보보호를 위한 다중 유형 객체 탐지 기반 비식별화 기법)

  • Ye-Seul Kil;Hyo-Jin Lee;Jung-Hwa Ryu;Il-Gu Lee
    • Convergence Security Journal
    • /
    • v.22 no.5
    • /
    • pp.11-20
    • /
    • 2022
  • As the Internet and web technology develop around mobile devices, image data contains various types of sensitive information such as people, text, and space. In addition to these characteristics, as the use of SNS increases, the amount of damage caused by exposure and abuse of personal information online is increasing. However, research on de-identification technology based on multi-type object detection for personal information protection is insufficient. Therefore, this paper proposes an artificial intelligence model that detects and de-identifies multiple types of objects using existing single-type object detection models in parallel. Through cutmix, an image in which person and text objects exist together are created and composed of training data, and detection and de-identification of objects with different characteristics of person and text was performed. The proposed model achieves a precision of 0.724 and mAP@.5 of 0.745 when two objects are present at the same time. In addition, after de-identification, mAP@.5 was 0.224 for all objects, showing a decrease of 0.4 or more.

Proposal for User-Product Attributes to Enhance Chatbot-Based Personalized Fashion Recommendation Service (챗봇 기반의 개인화 패션 추천 서비스 향상을 위한 사용자-제품 속성 제안)

  • Hyosun An;Sunghoon Kim;Yerim Choi
    • Journal of Fashion Business
    • /
    • v.27 no.3
    • /
    • pp.50-62
    • /
    • 2023
  • The e-commerce fashion market has experienced a remarkable growth, leading to an overwhelming availability of shared information and numerous choices for users. In light of this, chatbots have emerged as a promising technological solution to enhance personalized services in this context. This study aimed to develop user-product attributes for a chatbot-based personalized fashion recommendation service using big data text mining techniques. To accomplish this, over one million consumer reviews from Coupang, an e-commerce platform, were collected and analyzed using frequency analyses to identify the upper-level attributes of users and products. Attribute terms were then assigned to each user-product attribute, including user body shape (body proportion, BMI), user needs (functional, expressive, aesthetic), user TPO (time, place, occasion), product design elements (fit, color, material, detail), product size (label, measurement), and product care (laundry, maintenance). The classification of user-product attributes was found to be applicable to the knowledge graph of the Conversational Path Reasoning model. A testing environment was established to evaluate the usefulness of attributes based on real e-commerce users and purchased product information. This study is significant in proposing a new research methodology in the field of Fashion Informatics for constructing the knowledge base of a chatbot based on text mining analysis. The proposed research methodology is expected to enhance fashion technology and improve personalized fashion recommendation service and user experience with a chatbot in the e-commerce market.

Speech Emotion Recognition in People at High Risk of Dementia

  • Dongseon Kim;Bongwon Yi;Yugwon Won
    • Dementia and Neurocognitive Disorders
    • /
    • v.23 no.3
    • /
    • pp.146-160
    • /
    • 2024
  • Background and Purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia. Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition. Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%. Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

On using the LPC parameter for Speaker Identification (LPC에 의한 화자 식별)

  • 조병모
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1987.11a
    • /
    • pp.82-85
    • /
    • 1987
  • Preliminary results of using the LPC parameter for text-independent speaker identification problem are presented. The idetification process includes log likelihood ratio for distance measure and dynamic programming for time normalization. To generate the data base for experiments, ten times. Experimental results show 99.4% of identification accuracy, incorrect identification were made when the speaker uses a dialect.

  • PDF