• Title/Summary/Keyword: Facial Model

Search Result 529, Processing Time 0.029 seconds

Feature Based Techniques for a Driver's Distraction Detection using Supervised Learning Algorithms based on Fixed Monocular Video Camera

  • Ali, Syed Farooq;Hassan, Malik Tahir
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3820-3841
    • /
    • 2018
  • Most of the accidents occur due to drowsiness while driving, avoiding road signs and due to driver's distraction. Driver's distraction depends on various factors which include talking with passengers while driving, mood disorder, nervousness, anger, over-excitement, anxiety, loud music, illness, fatigue and different driver's head rotations due to change in yaw, pitch and roll angle. The contribution of this paper is two-fold. Firstly, a data set is generated for conducting different experiments on driver's distraction. Secondly, novel approaches are presented that use features based on facial points; especially the features computed using motion vectors and interpolation to detect a special type of driver's distraction, i.e., driver's head rotation due to change in yaw angle. These facial points are detected by Active Shape Model (ASM) and Boosted Regression with Markov Networks (BoRMaN). Various types of classifiers are trained and tested on different frames to decide about a driver's distraction. These approaches are also scale invariant. The results show that the approach that uses the novel ideas of motion vectors and interpolation outperforms other approaches in detection of driver's head rotation. We are able to achieve a percentage accuracy of 98.45 using Neural Network.

Unconstrained e-Book Control Program by Detecting Facial Characteristic Point and Tracking in Real-time (얼굴의 특이점 검출 및 실시간 추적을 이용한 e-Book 제어)

  • Kim, Hyun-Woo;Park, Joo-Yong;Lee, Jeong-Jick;Yoon, Young-Ro
    • Journal of Biomedical Engineering Research
    • /
    • v.35 no.2
    • /
    • pp.14-18
    • /
    • 2014
  • This study is about e-Book program based on human-computer interaction(HCI) system for physically handicapped person. By acquiring background knowledge of HCI, we know that if we use vision-based interface we can replace current computer input devices by extracting any characteristic point and tracing it. We decided betweeneyes as a characteristic point by analyzing facial input image using webcam. But because of three-dimensional structure of glasses, the person who is wearing glasses wasn't suitable for tracing between-eyes. So we changed characteristic point to the bridge of the nose after detecting between-eyes. By using this technique, we could trace rotation of head in real-time regardless of glasses. To test this program's usefulness, we conducted an experiment to analyze the test result on actual application. Consequently, we got 96.5% rate of success for controlling e-Book under proper condition by analyzing the test result of 20 subjects.

Micro-Expression Recognition Base on Optical Flow Features and Improved MobileNetV2

  • Xu, Wei;Zheng, Hao;Yang, Zhongxue;Yang, Yingjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.1981-1995
    • /
    • 2021
  • When a person tries to conceal emotions, real emotions will manifest themselves in the form of micro-expressions. Research on facial micro-expression recognition is still extremely challenging in the field of pattern recognition. This is because it is difficult to implement the best feature extraction method to cope with micro-expressions with small changes and short duration. Most methods are based on hand-crafted features to extract subtle facial movements. In this study, we introduce a method that incorporates optical flow and deep learning. First, we take out the onset frame and the apex frame from each video sequence. Then, the motion features between these two frames are extracted using the optical flow method. Finally, the features are inputted into an improved MobileNetV2 model, where SVM is applied to classify expressions. In order to evaluate the effectiveness of the method, we conduct experiments on the public spontaneous micro-expression database CASME II. Under the condition of applying the leave-one-subject-out cross-validation method, the recognition accuracy rate reaches 53.01%, and the F-score reaches 0.5231. The results show that the proposed method can significantly improve the micro-expression recognition performance.

Sign2Gloss2Text-based Sign Language Translation with Enhanced Spatial-temporal Information Centered on Sign Language Movement Keypoints (수어 동작 키포인트 중심의 시공간적 정보를 강화한 Sign2Gloss2Text 기반의 수어 번역)

  • Kim, Minchae;Kim, Jungeun;Kim, Ha Young
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.10
    • /
    • pp.1535-1545
    • /
    • 2022
  • Sign language has completely different meaning depending on the direction of the hand or the change of facial expression even with the same gesture. In this respect, it is crucial to capture the spatial-temporal structure information of each movement. However, sign language translation studies based on Sign2Gloss2Text only convey comprehensive spatial-temporal information about the entire sign language movement. Consequently, detailed information (facial expression, gestures, and etc.) of each movement that is important for sign language translation is not emphasized. Accordingly, in this paper, we propose Spatial-temporal Keypoints Centered Sign2Gloss2Text Translation, named STKC-Sign2 Gloss2Text, to supplement the sequential and semantic information of keypoints which are the core of recognizing and translating sign language. STKC-Sign2Gloss2Text consists of two steps, Spatial Keypoints Embedding, which extracts 121 major keypoints from each image, and Temporal Keypoints Embedding, which emphasizes sequential information using Bi-GRU for extracted keypoints of sign language. The proposed model outperformed all Bilingual Evaluation Understudy(BLEU) scores in Development(DEV) and Testing(TEST) than Sign2Gloss2Text as the baseline, and in particular, it proved the effectiveness of the proposed methodology by achieving 23.19, an improvement of 1.87 based on TEST BLEU-4.

Robust Head Pose Estimation for Masked Face Image via Data Augmentation (데이터 증강을 통한 마스크 착용 얼굴 이미지에 강인한 얼굴 자세추정)

  • Kyeongtak, Han;Sungeun, Hong
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.944-947
    • /
    • 2022
  • Due to the coronavirus pandemic, the wearing of a mask has been increasing worldwide; thus, the importance of image analysis on masked face images has become essential. Although head pose estimation can be applied to various face-related applications including driver attention, face frontalization, and gaze detection, few studies have been conducted to address the performance degradation caused by masked faces. This study proposes a new data augmentation that synthesizes the masked face, depending on the face image size and poses, which shows robust performance on BIWI benchmark dataset regardless of mask-wearing. Since the proposed scheme is not limited to the specific model, it can be utilized in various head pose estimation models.

Multimodal Attention-Based Fusion Model for Context-Aware Emotion Recognition

  • Vo, Minh-Cong;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.18 no.3
    • /
    • pp.11-20
    • /
    • 2022
  • Human Emotion Recognition is an exciting topic that has been attracting many researchers for a lengthy time. In recent years, there has been an increasing interest in exploiting contextual information on emotion recognition. Some previous explorations in psychology show that emotional perception is impacted by facial expressions, as well as contextual information from the scene, such as human activities, interactions, and body poses. Those explorations initialize a trend in computer vision in exploring the critical role of contexts, by considering them as modalities to infer predicted emotion along with facial expressions. However, the contextual information has not been fully exploited. The scene emotion created by the surrounding environment, can shape how people perceive emotion. Besides, additive fusion in multimodal training fashion is not practical, because the contributions of each modality are not equal to the final prediction. The purpose of this paper was to contribute to this growing area of research, by exploring the effectiveness of the emotional scene gist in the input image, to infer the emotional state of the primary target. The emotional scene gist includes emotion, emotional feelings, and actions or events that directly trigger emotional reactions in the input image. We also present an attention-based fusion network, to combine multimodal features based on their impacts on the target emotional state. We demonstrate the effectiveness of the method, through a significant improvement on the EMOTIC dataset.

A study on the effectiveness of intermediate features in deep learning on facial expression recognition

  • KyeongTeak Oh;Sun K. Yoo
    • International journal of advanced smart convergence
    • /
    • v.12 no.2
    • /
    • pp.25-33
    • /
    • 2023
  • The purpose of this study is to evaluate the impact of intermediate features on FER performance. To achieve this objective, intermediate features were extracted from the input images at specific layers (FM1~FM4) of the pre-trained network (Resnet-18). These extracted intermediate features and original images were used as inputs to the vision transformer (ViT), and the FER performance was compared. As a result, when using a single image as input, using intermediate features extracted from FM2 yielded the best performance (training accuracy: 94.35%, testing accuracy: 75.51%). When using the original image as input, the training accuracy was 91.32% and the testing accuracy was 74.68%. However, when combining the original image with intermediate features as input, the best FER performance was achieved by combining the original image with FM2, FM3, and FM4 (training accuracy: 97.88%, testing accuracy: 79.21%). These results imply that incorporating intermediate features alongside the original image can lead to superior performance. The findings can be referenced and utilized when designing the preprocessing stages of a deep learning model in FER. By considering the effectiveness of using intermediate features, practitioners can make informed decisions to enhance the performance of FER systems.

ORTHOGNATHIC SURGERY USING MODEL REPOSITIONING INSTRUMENT: A CASE REPORT (Model Repositioning Instrument를 이용한 악교정 수술의 치험례)

  • Lee, Nam-Ki;Choi, Dong-Soon;Cha, Bong-Kuen;Park, Young-Wook;Kim, Ji-Hyuck
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • v.28 no.3
    • /
    • pp.254-261
    • /
    • 2006
  • Moderate to severe dentofacial deformities usually require combined orthodontic treatment and orthognathic surgery to obtain the most stable result with optimal function and facial esthetics. Accordingly, the orthodontist and oral maxillofacial surgeon must be able to exactly diagnose existing deformities, establish an appropriate treatment plan, and execute the recommended treatment. Especially, to obtain optimal result of the maxillary surgery, model surgery is essential. But, the preoperatively planned position of the maxillary dental arch often cannot be sufficiently achieved during actual surgery, and deviations in the sagittal and vertical dimensions are common. To achieve three dimensional repositioning of the maxilla exactly, several methods have been introduced so far. Recently Model Repositioning Instrument (MRI, SAM, Inc., $M\ddot{u}nchen$, Germany), one of these methods, has been introduced and applied clinically, which is reported as accurate, effective and prompt method for three dimensional repositioning of the maxilla. This article describes an introduction and a clinical application of this MRI.

Improvement of a Context-aware Recommender System through User's Emotional State Prediction (사용자 감정 예측을 통한 상황인지 추천시스템의 개선)

  • Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.4
    • /
    • pp.203-223
    • /
    • 2014
  • This study proposes a novel context-aware recommender system, which is designed to recommend the items according to the customer's responses to the previously recommended item. In specific, our proposed system predicts the user's emotional state from his or her responses (such as facial expressions and movements) to the previous recommended item, and then it recommends the items that are similar to the previous one when his or her emotional state is estimated as positive. If the customer's emotional state on the previously recommended item is regarded as negative, the system recommends the items that have characteristics opposite to the previous item. Our proposed system consists of two sub modules-(1) emotion prediction module, and (2) responsive recommendation module. Emotion prediction module contains the emotion prediction model that predicts a customer's arousal level-a physiological and psychological state of being awake or reactive to stimuli-using the customer's reaction data including facial expressions and body movements, which can be measured using Microsoft's Kinect Sensor. Responsive recommendation module generates a recommendation list by using the results from the first module-emotion prediction module. If a customer shows a high level of arousal on the previously recommended item, the module recommends the items that are most similar to the previous item. Otherwise, it recommends the items that are most dissimilar to the previous one. In order to validate the performance and usefulness of the proposed recommender system, we conducted empirical validation. In total, 30 undergraduate students participated in the experiment. We used 100 trailers of Korean movies that had been released from 2009 to 2012 as the items for recommendation. For the experiment, we manually constructed Korean movie trailer DB which contains the fields such as release date, genre, director, writer, and actors. In order to check if the recommendation using customers' responses outperforms the recommendation using their demographic information, we compared them. The performance of the recommendation was measured using two metrics-satisfaction and arousal levels. Experimental results showed that the recommendation using customers' responses (i.e. our proposed system) outperformed the recommendation using their demographic information with statistical significance.

Development of Computer Assisted 3-D Simulation and Prediction Surgery in Craniofacial Distraction Osteogenesis (악안면 골신장술의 치료계획을 위한 3차원 시뮬레이션 프로토콜의 개발)

  • Paeng Jun-Young;Lee Jee-Ho;Lee Jong-Ho;Baek Seung-Hak;Kim Myung-Jin
    • Korean Journal of Cleft Lip And Palate
    • /
    • v.6 no.2
    • /
    • pp.91-105
    • /
    • 2003
  • There are significant limitations in the precision of mandibular distraction in setting a desired occlusal and facial esthetic outcome. The purpose of this study is to present the simulation method for the distraction osteogenesis treatment planning. 3-D surgery simulation software programs V-works and V-Surgery(Cybermed, Seoul, Korea) were used from the 3D CT data in addition to the conventional data facial photography, panorama and cephalogram, dental cast model. We have utilized already for the various surgical procedures to get information preoperatively for the maxillofacial surgery like cancer localization and reconstructive surgery, orthognathic surgery and implant surgery in the department of Oral and Maxillofacial surgery, Seoul National University Hospital. On the software, bone cutting can be done at any place and any direction. Separated bone segment can be mobilized in all 3 dimensional direction. After the 3D simulation on the software program, mock surgery on the RP model can be performed. This planning method was applied to two hemifacial microsomia patients. With this protocol, we could simulate the movement of bony segment after maxillofacial distraction osteogenesis

  • PDF