• Title/Summary/Keyword: 멀티모달 학습

Search Result 77, Processing Time 0.022 seconds

Gait Type Classification Using Multi-modal Ensemble Deep Learning Network

  • Park, Hee-Chan;Choi, Young-Chan;Choi, Sang-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.29-38
    • /
    • 2022
  • This paper proposes a system for classifying gait types using an ensemble deep learning network for gait data measured by a smart insole equipped with multi-sensors. The gait type classification system consists of a part for normalizing the data measured by the insole, a part for extracting gait features using a deep learning network, and a part for classifying the gait type by inputting the extracted features. Two kinds of gait feature maps were extracted by independently learning networks based on CNNs and LSTMs with different characteristics. The final ensemble network classification results were obtained by combining the classification results. For the seven types of gait for adults in their 20s and 30s: walking, running, fast walking, going up and down stairs, and going up and down hills, multi-sensor data was classified into a proposed ensemble network. As a result, it was confirmed that the classification rate was higher than 90%.

A Viewer Preference Model Based on Physiological Feedback (CogTV를 위한 생체신호기반 시청자 선호도 모델)

  • Park, Tae-Suh;Kim, Byoung-Hee;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.3
    • /
    • pp.316-322
    • /
    • 2014
  • A movie recommendation system is proposed to learn a preference model of a viewer by using multimodal features of a video content and their evoked implicit responses of the viewer in synchronized manner. In this system, facial expression, body posture, and physiological signals are measured to estimate the affective states of the viewer, in accordance with the stimuli consisting of low-level and affective features from video, audio, and text streams. Experimental results show that it is possible to predict arousal response, which is measured by electrodermal activity, of a viewer from auditory and text features in a video stimuli, for estimating interestingness on the video.

Text Augmentation Using Hierarchy-based Word Replacement

  • Kim, Museong;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.57-67
    • /
    • 2021
  • Recently, multi-modal deep learning techniques that combine heterogeneous data for deep learning analysis have been utilized a lot. In particular, studies on the synthesis of Text to Image that automatically generate images from text are being actively conducted. Deep learning for image synthesis requires a vast amount of data consisting of pairs of images and text describing the image. Therefore, various data augmentation techniques have been devised to generate a large amount of data from small data. A number of text augmentation techniques based on synonym replacement have been proposed so far. However, these techniques have a common limitation in that there is a possibility of generating a incorrect text from the content of an image when replacing the synonym for a noun word. In this study, we propose a text augmentation method to replace words using word hierarchy information for noun words. Additionally, we performed experiments using MSCOCO data in order to evaluate the performance of the proposed methodology.

Multi-modal Emotion Recognition using Semi-supervised Learning and Multiple Neural Networks in the Wild (준 지도학습과 여러 개의 딥 뉴럴 네트워크를 사용한 멀티 모달 기반 감정 인식 알고리즘)

  • Kim, Dae Ha;Song, Byung Cheol
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.351-360
    • /
    • 2018
  • Human emotion recognition is a research topic that is receiving continuous attention in computer vision and artificial intelligence domains. This paper proposes a method for classifying human emotions through multiple neural networks based on multi-modal signals which consist of image, landmark, and audio in a wild environment. The proposed method has the following features. First, the learning performance of the image-based network is greatly improved by employing both multi-task learning and semi-supervised learning using the spatio-temporal characteristic of videos. Second, a model for converting 1-dimensional (1D) landmark information of face into two-dimensional (2D) images, is newly proposed, and a CNN-LSTM network based on the model is proposed for better emotion recognition. Third, based on an observation that audio signals are often very effective for specific emotions, we propose an audio deep learning mechanism robust to the specific emotions. Finally, so-called emotion adaptive fusion is applied to enable synergy of multiple networks. The proposed network improves emotion classification performance by appropriately integrating existing supervised learning and semi-supervised learning networks. In the fifth attempt on the given test set in the EmotiW2017 challenge, the proposed method achieved a classification accuracy of 57.12%.

Game Platform and System that Synchronize Actual Humanoid Robot with Virtual 3D Character Robot (가상의 3D와 실제 로봇이 동기화하는 시스템 및 플랫폼)

  • Park, Chang-Hyun;Lee, Chang-Jo
    • Journal of Korea Entertainment Industry Association
    • /
    • v.8 no.2
    • /
    • pp.283-297
    • /
    • 2014
  • The future of human life is expected to be innovative by increasing social, economic, political and personal, including all areas of life across the multi-disciplinary skills. Particularly, in the field of robotics and next-generation games with robots, by multidisciplinary contributions and interaction, convergence between technology is expected to accelerate more and more. The purpose of this study is that by new interface model beyond the technical limitations of the "human-robot interface technology," until now and time and spatial constraints and through fusion of various modalities which existing human-robot interface technologies can't have, the research of more reliable and easy free "human-robot interface technology". This is the research of robot game system which develop and utilizing real time synchronization engine linking between biped humanoid robot and the behavior of the position value of mobile device screen's 3D content (contents), robot (virtual robots), the wireless protocol for sending and receiving (Protocol) mutual information and development of a teaching program of "Direct Teaching & Play" by the study for effective teaching.

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

A Study on UI Prototyping Based on Personality of Things for Interusability in IoT Environment (IoT 환경에서 인터유저빌리티(Interusability) 개선을 위한 사물성격(Personality of Things)중심의 UI 프로토타이핑에 대한 연구)

  • Ahn, Mikyung;Park, Namchoon
    • Journal of the HCI Society of Korea
    • /
    • v.13 no.2
    • /
    • pp.31-44
    • /
    • 2018
  • In the IoT environment, various things could be connected. Those connected things learn and operate themselves, by acquiring data. As human being, they have self-learning and self-operating systems. In the field of IoT study, therefore, the key issue is to design communication system connecting both of the two different types of subjects, human being(user) and the things. With the advent of the IoT environment, much research has been done in the field of UI design. It can be seen that research has been conducted to take complex factors into account through keywords such as multi-modality and interusability. However, the existing UI design method has limitations in structuring or testing interaction between things and users of IoT environment. Therefore, this paper suggests a new UI prototyping method. In this paper, the major analysis and studies are as follows: (1) defined what is the behavior process of the things (2) analyzed the existing IoT product (3) built a new framework driving personality types (4) extracted three representative personality models (5) applied the three models to the smart home service and tested UI prototyping. It is meaningful with that this study can confirm user experience (UX) about IoT service in a more comprehensive way. Moreover, the concept of the personality of things will be utilized as a tool for establishing the identity of artificial intelligence (AI) services in the future.

  • PDF