• Title/Summary/Keyword: Multimodal recognition

Search Result 101, Processing Time 0.025 seconds

The Research on Emotion Recognition through Multimodal Feature Combination (멀티모달 특징 결합을 통한 감정인식 연구)

  • Sung-Sik Kim;Jin-Hwan Yang;Hyuk-Soon Choi;Jun-Heok Go;Nammee Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.739-740
    • /
    • 2024
  • 본 연구에서는 음성과 텍스트라는 두 가지 모달리티의 데이터를 효과적으로 결합함으로써, 감정 분류의 정확도를 향상시키는 새로운 멀티모달 모델 학습 방법을 제안한다. 이를 위해 음성 데이터로부터 HuBERT 및 MFCC(Mel-Frequency Cepstral Coefficients)기법을 통해 추출한 특징 벡터와 텍스트 데이터로부터 RoBERTa를 통해 추출한 특징 벡터를 결합하여 감정을 분류한다. 실험 결과, 제안한 멀티모달 모델은 F1-Score 92.30으로 유니모달 접근 방식에 비해 우수한 성능 향상을 보였다.

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv (챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로)

  • Park, Dae-Min;Lee, Han-Jong
    • Informatization Policy
    • /
    • v.31 no.2
    • /
    • pp.3-38
    • /
    • 2024
  • Hallucination is a significant barrier to the utilization of large-scale language models or multimodal models. In this study, we collected 654 computer science papers with "hallucination" in the abstract from arXiv from December 2022 to January 2024 following the advent of Chat GPT and conducted frequency analysis, knowledge network analysis, and literature review to explore the latest trends in hallucination research. The results showed that research in the fields of "Computation and Language," "Artificial Intelligence," "Computer Vision and Pattern Recognition," and "Machine Learning" were active. We then analyzed the research trends in the four major fields by focusing on the main authors and dividing them into data, hallucination detection, and hallucination mitigation. The main research trends included hallucination mitigation through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), inference enhancement via "chain of thought" (CoT), and growing interest in hallucination mitigation within the domain of multimodal AI. This study provides insights into the latest developments in hallucination research through a technology-oriented literature review. This study is expected to help subsequent research in both engineering and humanities and social sciences fields by understanding the latest trends in hallucination research.

Fusion algorithm for Integrated Face and Gait Identification (얼굴과 발걸음을 결합한 인식)

  • Nizami, Imran Fareed;An, Sung-Je;Hong, Sung-Jun;Lee, Hee-Sung;Kim, Eun-Tai;Park, Mig-Non
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.72-77
    • /
    • 2008
  • Identification of humans from multiple view points is an important task for surveillance and security purposes. For optimal performance the system should use the maximum information available from sensors. Multimodal biometric systems are capable of utilizing more than one physiological or behavioral characteristic for enrollment, verification, or identification. Since gait alone is not yet established as a very distinctive feature, this paper presents an approach to fuse face and gait for identification. In this paper we will use the single camera case i.e both the face and gait recognition is done using the same set of images captured by a single camera. The aim of this paper is to improve the performance of the system by utilizing the maximum amount of information available in the images. Fusion in considered at decision level. The proposed algorithm is tested on the NLPR database.

A Robust Watermarking Algorithm using Wavelet for Biometric Information (웨이블렛을 이용한 생체정보의 강인한 워터마킹 알고리즘)

  • Lee, Wook-Jae;Lee, Dae-Jong;Moon, Ki-Young;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.632-639
    • /
    • 2007
  • This paper presents a wavelet-based watermarking algorithm to securely hide biometric features such as face and fingerprint and effectively extract them with less distortion of the concealed data. To hide the biometric features, we proposed a determination method of insert location based on wavelet transform and adaptive weight method according to the image characteristics. The hidden features are effectively extracted by applying the inverse wavelet transform to the watermarked image. To show the effectiveness, we analyze the various performance such as PSNR and correlation of watermark features before and after applying watermarking. Also, we evaluate the effect of watermaking algorithm with respect to biometric system such as recognition rate. Recognition rate shows 98.67% for multimodal biometric systems consisted of face and fingerprint. From these, we confirm that the proposed method makes it possible to effectively hide and extract the biometric features without lowering recognition rate.

Monitoring Mood Trends of Twitter Users using Multi-modal Analysis method of Texts and Images (텍스트 및 영상의 멀티모달분석을 이용한 트위터 사용자의 감성 흐름 모니터링 기술)

  • Kim, Eun Yi;Ko, Eunjeong
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.1
    • /
    • pp.419-431
    • /
    • 2018
  • In this paper, we propose a novel method for monitoring mood trend of Twitter users by analyzing their daily tweets for a long period. Then, to more accurately understand their tweets, we analyze all types of content in tweets, i.e., texts and emoticons, and images, thus develop a multimodal sentiment analysis method. In the proposed method, two single-modal analyses first are performed to extract the users' moods hidden in texts and images: a lexicon-based and learning-based text classifier and a learning-based image classifier. Thereafter, the extracted moods from the respective analyses are combined into a tweet mood and aggregated a daily mood. As a result, the proposed method generates a user daily mood flow graph, which allows us for monitoring the mood trend of users more intuitively. For evaluation, we perform two sets of experiment. First, we collect the data sets of 40,447 data. We evaluate our method via comparing the state-of-the-art techniques. In our experiments, we demonstrate that the proposed multimodal analysis method outperforms other baselines and our own methods using text-based tweets or images only. Furthermore, to evaluate the potential of the proposed method in monitoring users' mood trend, we tested the proposed method with 40 depressive users and 40 normal users. It proves that the proposed method can be effectively used in finding depressed users.

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

  • Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.81-88
    • /
    • 2023
  • Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.

Using Keystroke Dynamics for Implicit Authentication on Smartphone

  • Do, Son;Hoang, Thang;Luong, Chuyen;Choi, Seungchan;Lee, Dokyeong;Bang, Kihyun;Choi, Deokjai
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.8
    • /
    • pp.968-976
    • /
    • 2014
  • Authentication methods on smartphone are demanded to be implicit to users with minimum users' interaction. Existing authentication methods (e.g. PINs, passwords, visual patterns, etc.) are not effectively considering remembrance and privacy issues. Behavioral biometrics such as keystroke dynamics and gait biometrics can be acquired easily and implicitly by using integrated sensors on smartphone. We propose a biometric model involving keystroke dynamics for implicit authentication on smartphone. We first design a feature extraction method for keystroke dynamics. And then, we build a fusion model of keystroke dynamics and gait to improve the authentication performance of single behavioral biometric on smartphone. We operate the fusion at both feature extraction level and matching score level. Experiment using linear Support Vector Machines (SVM) classifier reveals that the best results are achieved with score fusion: a recognition rate approximately 97.86% under identification mode and an error rate approximately 1.11% under authentication mode.

Personal Biometric Identification based on ECG Features (ECG 특징추출 기반 개인 바이오 인식)

  • Yoon, Seok-Joo;Kim, Gwang-Jun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.4
    • /
    • pp.521-526
    • /
    • 2015
  • Research on how to use the biological characteristics of human to confirm the identity of the individual is being actively conducted. Electrocardiogram(: ECG) based biometric system is difficult to counterfeit and does not cause skin irritation on the subject. It can be easily combined with conventional biometrics such as fingerprint and face recognition to give multimodal biometric systems. In this thesis, biometric identification method analysing ECG waveform characteristics from Discrete Wavelet Transform(DWT) coefficients is suggested. Feature selection is performed on the 9 coefficients of DWT using the correlation analysis. The verification is achieved by using the error back propagation neural networks. Using the proposed approach on 24 subjects of MIT-BIH QT Database, 98.88% verification rate has been obtained.

A Quality Assessment Method of Biometrics for Estimating Authentication Result in User Authentication System (사용자 인증시스템의 인증결과 예측을 위한 바이오정보의 품질평가기법)

  • Kim, Ae-Young;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.242-246
    • /
    • 2010
  • In this paper, we propose a quality assessment method of biometrics for estimating an authentication result in an user authentication system. The proposed quality assessment method is designed to compute a quality score called CIMR (Confidence Interval Matching Ratio) as a result by small-sample analysis like T-test. We use the C/MR-based quality assessment method for testing how to well draw a distinction between various biometrics in a multimodal biometric system. We also test a predictability for authentication results of obtained biometrics using the mean $\bar{X}$ and the variance $s^2$ in T-test-based CIMR. As a result, we achieved the maximum 88% accuracy for estimation of user authentication results.

A Study on the Recognition of Korean Language Teachers on Media Literacy Education (한국어 교사의 매체 문식성 교육에 대한 인식 연구)

  • Jeon, Hyeoung-gil
    • Journal of Korean language education
    • /
    • v.28 no.2
    • /
    • pp.155-184
    • /
    • 2017
  • As the media changes, communication patterns in modern society have been changed as well. This change in the media environment has also transformed the required literacy and it is time to accept this new literacy in Korean language education. At this point, this paper inspected the perception of media literacy classes of 73 teachers currently in the field of Korean language education. The results show that most teachers are aware of the media literacy which has changed socially, and show strong agreement that this changed literacy should be applied in the field of Korean language education. However, today's media literacy education is passive. Although teachers generally understand the dynamic features of newly emerging digital media, it remains as a tool in class. The teachers pointed out that device problems such as device environment and the spread of media are one of the many reasons for such passive usage. However, the more fundamental problem is that the new communication environment has not been reflected in the curriculum actively. Teachers thought that media literacy has a close relationship with Korean proficiency. Also, they saw that this kind of media literacy will be required for Korean learners in the future with more importance. Based on the results of the study, this paper argues that Korean language education needs to accept and reflect the changes of media in the curriculum.