• Title/Summary/Keyword: multi-modal

Search Result 630, Processing Time 0.024 seconds

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation (대화 영상 생성을 위한 한국어 감정음성 및 얼굴 표정 데이터베이스)

  • Baek, Ji-Young;Kim, Sera;Lee, Seok-Pil
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.71-77
    • /
    • 2022
  • In this paper, a database is collected for extending the speech synthesis model to a model that synthesizes speech according to emotions and generating facial expressions. The database is divided into male and female data, and consists of emotional speech and facial expressions. Two professional actors of different genders speak sentences in Korean. Sentences are divided into four emotions: happiness, sadness, anger, and neutrality. Each actor plays about 3300 sentences per emotion. A total of 26468 sentences collected by filming this are not overlap and contain expression similar to the corresponding emotion. Since building a high-quality database is important for the performance of future research, the database is assessed on emotional category, intensity, and genuineness. In order to find out the accuracy according to the modality of data, the database is divided into audio-video data, audio data, and video data.

Multi-modal Representation Learning for Classification of Imported Goods (수입물품의 품목 분류를 위한 멀티모달 표현 학습)

  • Apgil Lee;Keunho Choi;Gunwoo Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.203-214
    • /
    • 2023
  • The Korea Customs Service is efficiently handling business with an electronic customs system that can effectively handle one-stop business. This is the case and a more effective method is needed. Import and export require HS Code (Harmonized System Code) for classification and tax rate application for all goods, and item classification that classifies the HS Code is a highly difficult task that requires specialized knowledge and experience and is an important part of customs clearance procedures. Therefore, this study uses various types of data information such as product name, product description, and product image in the item classification request form to learn and develop a deep learning model to reflect information well based on Multimodal representation learning. It is expected to reduce the burden of customs duties by classifying and recommending HS Codes and help with customs procedures by promptly classifying items.

Healthy lifestyle interventions for childhood and adolescent cancer survivors: a systematic review and meta-analysis

  • Kyung-Ah Kang;Suk Jung Han;Jiyoung Chun;Hyun-Yong Kim;Yerin Oh;Heejin Yoon
    • Child Health Nursing Research
    • /
    • v.29 no.2
    • /
    • pp.111-127
    • /
    • 2023
  • Purpose: This study investigated the effects of healthy lifestyle interventions (HLSIs) on health-related quality of life (HR-QoL) in childhood and adolescent cancer survivors (CACS). Methods: Major databases were searched for English-language original articles published between January 1, 2000 and May 2, 2021. Randomized controlled trials (RCTs) and non-RCTs were included. Quality was assessed using the revised Cochrane risk-of-bias tool, and a meta-analysis was conducted using RevMan 5.3 software. Results: Nineteen studies were included. Significant effects on HR-QoL were found for interventions using a multi-modal approach (exercise and education) (d=-0.46; 95% confidence interval [CI]=-0.84 to -0.07, p=.02), lasting not less than 6 months (d=-0.72; 95% CI=-1.15 to -0.29, p=.0010), and using a group approach (d=-0.46; 95% CI=-0.85 to -0.06, p=.02). Self-efficacy showed significant effects when HLSIs provided health education only (d=-0.55; 95% CI=-0.92 to -0.18; p=.003), lasted for less than 6 months (d=-0.40; 95% CI=-0.69 to -0.11, p=.006), and were conducted individually (d=-0.55; 95% CI=-0.92 to -0.18, p=.003). The physical outcomes (physical activity, fatigue, exercise capacity-VO2, exercise capacity-upper body, body mass index) revealed no statistical significance. Conclusion: Areas of HLSIs for CACS requiring further study were identified, and needs and directions of research for holistic health management were suggested.

Fashion attribute-based mixed reality visualization service (패션 속성기반 혼합현실 시각화 서비스)

  • Yoo, Yongmin;Lee, Kyounguk;Kim, Kyungsun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.2-5
    • /
    • 2022
  • With the advent of deep learning and the rapid development of ICT (Information and Communication Technology), research using artificial intelligence is being actively conducted in various fields of society such as politics, economy, and culture and so on. Deep learning-based artificial intelligence technology is subdivided into various domains such as natural language processing, image processing, speech processing, and recommendation system. In particular, as the industry is advanced, the need for a recommendation system that analyzes market trends and individual characteristics and recommends them to consumers is increasingly required. In line with these technological developments, this paper extracts and classifies attribute information from structured or unstructured text and image big data through deep learning-based technology development of 'language processing intelligence' and 'image processing intelligence', and We propose an artificial intelligence-based 'customized fashion advisor' service integration system that analyzes trends and new materials, discovers 'market-consumer' insights through consumer taste analysis, and can recommend style, virtual fitting, and design support.

  • PDF

Multi - Modal Interface Design for Non - Touch Gesture Based 3D Sculpting Task (비접촉식 제스처 기반 3D 조형 태스크를 위한 다중 모달리티 인터페이스 디자인 연구)

  • Son, Minji;Yoo, Seung Hun
    • Design Convergence Study
    • /
    • v.16 no.5
    • /
    • pp.177-190
    • /
    • 2017
  • This research aims to suggest a multimodal non-touch gesture interface design to improve the usability of 3D sculpting task. The task and procedure of design sculpting of users were analyzed across multiple circumstances from the physical sculpting to computer software. The optimal body posture, design process, work environment, gesture-task relationship, the combination of natural hand gesture and arm movement of designers were defined. The preliminary non-touch 3D S/W were also observed and natural gesture interaction, visual metaphor of UI and affordance for behavior guide were also designed. The prototype of gesture based 3D sculpting system were developed for validation of intuitiveness and learnability in comparison to the current S/W. The suggested gestures were proved with higher performance as a result in terms of understandability, memorability and error rate. Result of the research showed that the gesture interface design for productivity system should reflect the natural experience of users in previous work domain and provide appropriate visual - behavioral metaphor.

Design of the emotion expression in multimodal conversation interaction of companion robot (컴패니언 로봇의 멀티 모달 대화 인터랙션에서의 감정 표현 디자인 연구)

  • Lee, Seul Bi;Yoo, Seung Hun
    • Design Convergence Study
    • /
    • v.16 no.6
    • /
    • pp.137-152
    • /
    • 2017
  • This research aims to develop the companion robot experience design for elderly in korea based on needs-function deploy matrix of robot and emotion expression research of robot in multimodal interaction. First, Elder users' main needs were categorized into 4 groups based on ethnographic research. Second, the functional elements and physical actuators of robot were mapped to user needs in function- needs deploy matrix. The final UX design prototype was implemented with a robot type that has a verbal non-touch multi modal interface with emotional facial expression based on Ekman's Facial Action Coding System (FACS). The proposed robot prototype was validated through a user test session to analyze the influence of the robot interaction on the cognition and emotion of users by Story Recall Test and face emotion analysis software; Emotion API when the robot changes facial expression corresponds to the emotion of the delivered information by the robot and when the robot initiated interaction cycle voluntarily. The group with emotional robot showed a relatively high recall rate in the delayed recall test and In the facial expression analysis, the facial expression and the interaction initiation of the robot affected on emotion and preference of the elderly participants.

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

Virtual Object Weight Information with Multi-modal Sensory Feedback during Remote Manipulation (다중 감각 피드백을 통한 원격 가상객체 조작 시 무게 정보 전달)

  • Changhyeon Park;Jaeyoung Park
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.9-15
    • /
    • 2024
  • As virtual reality technology became popular, a high demand emerged for natural and efficient interaction with the virtual environment. Mid-air manipulation is one of the solutions to such needs, letting a user manipulate a virtual object in a 3D virtual space. In this paper, we focus on manipulating a remote virtual object while visually displaying the object and providing tactile information on the object's weight. We developed two types of wearable interfaces that can provide cutaneous or vibrotactile feedback on the virtual object weight to the user's fingertips. Human perception of the remote virtual object weight during manipulation was evaluated by conducting a psychophysics experiment. The results indicate a significant effect of haptic feedback on the perceived weight of the virtual object during manipulation.

Range Estimating Performance Evaluation of the Underwater Broadband Source by Array Invariant (Array Invariant를 이용한 수중 광대역 음원의 거리 추정성능 분석)

  • Kim Se-Young;Chun Seung-Yong;Kim Boo-Il;Kim Ki-Man
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.305-311
    • /
    • 2006
  • In this paper the performance of a array invariant method is evaluated for source-range estimation in horizontally stratified shallow water ocean waveguide. The method has advantage of little computationally effort over existing source-localization methods. such as matched field processing or the waveguide invariant and array gain is fully exploited. And. no knowledge of the environment is required except that the received field should not be dominated by purely interference This simple and instantaneous method is applied to simulated acoustic propagation filed for testing range estimation performance. The result of range estimation according to the SNR for the underwater impulsive source with broadband spectrum is demonstrated. The spatial smoothing method is applied to suppress the effect of mutipath propagation by high frequency signal. The result of performance test for range estimation shows that the error rate is within 20% at the SNR above 10dB.