Search | Korea Science

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

Zhou, Xuan
- Journal of Information Processing Systems
- /
- v.17 no.2
- /
- pp.337-351
- /
- 2021
Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.
https://doi.org/10.3745/JIPS.01.0067 인용 PDF KSCI

Spatiotemporal Applications for Managing New&Renewable Energy Resources (신재생에너지 자원 관리를 위한 시공간 응용 기술)

Lee, Yang-Koo;Ryu, Keun-Ho;Kim, Kwang-Deuk
- 한국태양에너지학회:학술대회논문집
- /
- 2008.11a
- /
- pp.327-331
- /
- 2008
In this paper, we argue that new&renewable energy resources are difficult to be managed with GIS technology due to their spatiotemporal features, and suggest that spatiotemporal database and sensor network can be applied to the new&renewable energy management system as advanced technology. To give the motivated issues, we introduce and analyze the concept of the spatiotemporal database and sensor network, and the case studies in each applications.
PDF

No-reference quality assessment of dynamic sports videos based on a spatiotemporal motion model

Kim, Hyoung-Gook;Shin, Seung-Su;Kim, Sang-Wook;Lee, Gi Yong
- ETRI Journal
- /
- v.43 no.3
- /
- pp.538-548
- /
- 2021
This paper proposes an approach to improve the performance of no-reference video quality assessment for sports videos with dynamic motion scenes using an efficient spatiotemporal model. In the proposed method, we divide the video sequences into video blocks and apply a 3D shearlet transform that can efficiently extract primary spatiotemporal features to capture dynamic natural motion scene statistics from the incoming video blocks. The concatenation of a deep residual bidirectional gated recurrent neural network and logistic regression is used to learn the spatiotemporal correlation more robustly and predict the perceptual quality score. In addition, conditional video block-wise constraints are incorporated into the objective function to improve quality estimation performance for the entire video. The experimental results show that the proposed method extracts spatiotemporal motion information more effectively and predicts the video quality with higher accuracy than the conventional no-reference video quality assessment methods.
https://doi.org/10.4218/etrij.2020-0160 인용 PDF KSCI

An Adaptive ROI Detection System for Spatiotemporal Features (시.공간특징에 대해 적응할 수 있는 ROI 탐지 시스템)

Park Min-Chul;Cheoi Kyung-Joo
- The Journal of the Korea Contents Association
- /
- v.6 no.1
- /
- pp.41-53
- /
- 2006
In this paper, an adaptive ROI(region of interest) detection system for spatialtemporal features is proposed. It utilizes spatiotemporal features for the purpose of detecting ROI. It is assumed that motion representing temporal visual conspicuity between adjacent frames takes higher priority over spatial visual conspicuity. Because objects or regions in motion usually draw stronger attention than others in motion pictures. In case of still images visual features that constitute topographic feature maps are used as spatial features. Comparative experiments with a human subjective evaluation show that correct detection rate of visual attention region is improved by exploiting both spatial and temporal features compared to the case of exploiting either feature.
PDF

A New Covert Visual Attention System by Object-based Spatiotemporal Cues and Their Dynamic Fusioned Saliency Map (객체기반의 시공간 단서와 이들의 동적결합 된돌출맵에 의한 상향식 인공시각주의 시스템)

Cheoi, Kyungjoo
- Journal of Korea Multimedia Society
- /
- v.18 no.4
- /
- pp.460-472
- /
- 2015
Most of previous visual attention system finds attention regions based on saliency map which is combined by multiple extracted features. The differences of these systems are in the methods of feature extraction and combination. This paper presents a new system which has an improvement in feature extraction method of color and motion, and in weight decision method of spatial and temporal features. Our system dynamically extracts one color which has the strongest response among two opponent colors, and detects the moving objects not moving pixels. As a combination method of spatial and temporal feature, the proposed system sets the weight dynamically by each features' relative activities. Comparative results show that our suggested feature extraction and integration method improved the detection rate of attention region.
https://doi.org/10.9717/kmms.2015.18.4.460 인용 PDF KSCI KPUBS HTML

Extraction and classification of tempo stimuli from electroencephalography recordings using convolutional recurrent attention model

Lee, Gi Yong;Kim, Min-Soo;Kim, Hyoung-Gook
- ETRI Journal
- /
- v.43 no.6
- /
- pp.1081-1092
- /
- 2021
Electroencephalography (EEG) recordings taken during the perception of music tempo contain information that estimates the tempo of a music piece. If information about this tempo stimulus in EEG recordings can be extracted and classified, it can be effectively used to construct a music-based brain-computer interface. This study proposes a novel convolutional recurrent attention model (CRAM) to extract and classify features corresponding to tempo stimuli from EEG recordings of listeners who listened with concentration to the tempo of musics. The proposed CRAM is composed of six modules, namely, network inputs, two-dimensional convolutional bidirectional gated recurrent unit-based sample encoder, sample-level intuitive attention, segment encoder, segment-level intuitive attention, and softmax layer, to effectively model spatiotemporal features and improve the classification accuracy of tempo stimuli. To evaluate the proposed method's performance, we conducted experiments on two benchmark datasets. The proposed method achieves promising results, outperforming recent methods.
https://doi.org/10.4218/etrij.2021-0174 인용 PDF KSCI

Crime amount prediction based on 2D convolution and long short-term memory neural network

Dong, Qifen;Ye, Ruihui;Li, Guojun
- ETRI Journal
- /
- v.44 no.2
- /
- pp.208-219
- /
- 2022
Crime amount prediction is crucial for optimizing the police patrols' arrangement in each region of a city. First, we analyzed spatiotemporal correlations of the crime data and the relationships between crime and related auxiliary data, including points-of-interest (POI), public service complaints, and demographics. Then, we proposed a crime amount prediction model based on 2D convolution and long short-term memory neural network (2DCONV-LSTM). The proposed model captures the spatiotemporal correlations in the crime data, and the crime-related auxiliary data are used to enhance the regional spatial features. Extensive experiments on real-world datasets are conducted. Results demonstrated that capturing both temporal and spatial correlations in crime data and using auxiliary data to extract regional spatial features improve the prediction performance. In the best case scenario, the proposed model reduces the prediction error by at least 17.8% and 8.2% compared with support vector regression (SVR) and LSTM, respectively. Moreover, excessive auxiliary data reduce model performance because of the presence of redundant information.
https://doi.org/10.4218/etrij.2021-0396 인용 PDF KSCI

Depth Images-based Human Detection, Tracking and Activity Recognition Using Spatiotemporal Features and Modified HMM

Kamal, Shaharyar;Jalal, Ahmad;Kim, Daijin
- Journal of Electrical Engineering and Technology
- /
- v.11 no.6
- /
- pp.1857-1862
- /
- 2016
Human activity recognition using depth information is an emerging and challenging technology in computer vision due to its considerable attention by many practical applications such as smart home/office system, personal health care and 3D video games. This paper presents a novel framework of 3D human body detection, tracking and recognition from depth video sequences using spatiotemporal features and modified HMM. To detect human silhouette, raw depth data is examined to extract human silhouette by considering spatial continuity and constraints of human motion information. While, frame differentiation is used to track human movements. Features extraction mechanism consists of spatial depth shape features and temporal joints features are used to improve classification performance. Both of these features are fused together to recognize different activities using the modified hidden Markov model (M-HMM). The proposed approach is evaluated on two challenging depth video datasets. Moreover, our system has significant abilities to handle subject's body parts rotation and body parts missing which provide major contributions in human activity recognition.
https://doi.org/10.5370/JEET.2016.11.6.1857 인용 PDF KSCI

Human Activity Recognition Using Spatiotemporal 3-D Body Joint Features with Hidden Markov Models

Uddin, Md. Zia;Kim, Jaehyoun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.10 no.6
- /
- pp.2767-2780
- /
- 2016
Video-based human-activity recognition has become increasingly popular due to the prominent corresponding applications in a variety of fields such as computer vision, image processing, smart-home healthcare, and human-computer interactions. The essential goals of a video-based activity-recognition system include the provision of behavior-based information to enable functionality that proactively assists a person with his/her tasks. The target of this work is the development of a novel approach for human-activity recognition, whereby human-body-joint features that are extracted from depth videos are used. From silhouette images taken at every depth, the direction and magnitude features are first obtained from each connected body-joint pair so that they can be augmented later with motion direction, as well as with the magnitude features of each joint in the next frame. A generalized discriminant analysis (GDA) is applied to make the spatiotemporal features more robust, followed by the feeding of the time-sequence features into a Hidden Markov Model (HMM) for the training of each activity. Lastly, all of the trained-activity HMMs are used for depth-video activity recognition.
https://doi.org/10.3837/tiis.2016.06.017 인용 PDF KSCI KPUBS HTML

A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion

Ren, Qun
- Journal of Information Processing Systems
- /
- v.17 no.3
- /
- pp.556-570
- /
- 2021
The existing video expression recognition methods mainly focus on the spatial feature extraction of video expression images, but tend to ignore the dynamic features of video sequences. To solve this problem, a multi-mode convolution neural network method is proposed to effectively improve the performance of facial expression recognition in video. Firstly, OpenFace 2.0 is used to detect face images in video, and two deep convolution neural networks are used to extract spatiotemporal expression features. Furthermore, spatial convolution neural network is used to extract the spatial information features of each static expression image, and the dynamic information feature is extracted from the optical flow information of multiple expression images based on temporal convolution neural network. Then, the spatiotemporal features learned by the two deep convolution neural networks are fused by multiplication. Finally, the fused features are input into support vector machine to realize the facial expression classification. Experimental results show that the recognition accuracy of the proposed method can reach 64.57% and 60.89%, respectively on RML and Baum-ls datasets. It is better than that of other contrast methods.
https://doi.org/10.3745/JIPS.02.0156 인용 PDF KSCI

Search Result 41, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)