• Title/Summary/Keyword: Temporal Action Detection

Search Result 12, Processing Time 0.024 seconds

Trends in Temporal Action Detection in Untrimmed Videos (시간적 행동 탐지 기술 동향)

  • Moon, Jinyoung;Kim, Hyungil;Park, Jongyoul
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.20-33
    • /
    • 2020
  • Temporal action detection (TAD) in untrimmed videos is an important but a challenging problem in the field of computer vision and has gathered increasing interest recently. Although most studies on action in videos have addressed action recognition in trimmed videos, TAD methods are required to understand real-world untrimmed videos, including mostly background and some meaningful action instances belonging to multiple action classes. TAD is mainly composed of temporal action localization that generates temporal action proposals, such as single action and action recognition, which classifies action proposals into action classes. However, the task of generating temporal action proposals with accurate temporal boundaries is challenging in TAD. In this paper, we discuss TAD technologies that are considered high performance in terms of representative TAD studies based on deep learning. Further, we investigate evaluation methodologies for TAD, such as benchmark datasets and performance measures, and subsequently compare the performance of the discussed TAD models.

Trends in Online Action Detection in Streaming Videos (온라인 행동 탐지 기술 동향)

  • Moon, J.Y.;Kim, H.I.;Lee, Y.J.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.2
    • /
    • pp.75-82
    • /
    • 2021
  • Online action detection (OAD) in a streaming video is an attractive research area that has aroused interest lately. Although most studies for action understanding have considered action recognition in well-trimmed videos and offline temporal action detection in untrimmed videos, online action detection methods are required to monitor action occurrences in streaming videos. OAD predicts action probabilities for a current frame or frame sequence using a fixed-sized video segment, including past and current frames. In this article, we discuss deep learning-based OAD models. In addition, we investigated OAD evaluation methodologies, including benchmark datasets and performance measures, and compared the performances of the presented OAD models.

Spatial-Temporal Scale-Invariant Human Action Recognition using Motion Gradient Histogram (모션 그래디언트 히스토그램 기반의 시공간 크기 변화에 강인한 동작 인식)

  • Kim, Kwang-Soo;Kim, Tae-Hyoung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1075-1082
    • /
    • 2007
  • In this paper, we propose the method of multiple human action recognition on video clip. For being invariant to the change of speed or size of actions, Spatial-Temporal Pyramid method is applied. Proposed method can minimize the complexity of the procedures owing to select Motion Gradient Histogram (MGH) based on statistical approach for action representation feature. For multiple action detection, Motion Energy Image (MEI) of binary frame difference accumulations is adapted and then we detect each action of which area is represented by MGH. The action MGH should be compared with pre-learning MGH having pyramid method. As a result, recognition can be done by the analyze between action MGH and pre-learning MGH. Ten video clips are used for evaluating the proposed method. We have various experiments such as mono action, multiple action, speed and site scale-changes, comparison with previous method. As a result, we can see that proposed method is simple and efficient to recognize multiple human action with stale variations.

A Bi-directional Information Learning Method Using Reverse Playback Video for Fully Supervised Temporal Action Localization (완전지도 시간적 행동 검출에서 역재생 비디오를 이용한 양방향 정보 학습 방법)

  • Huiwon Gwon;Hyejeong Jo;Sunhee Jo;Chanho Jung
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.145-149
    • /
    • 2024
  • Recently, research on temporal action localization has been actively conducted. In this paper, unlike existing methods, we propose two approaches for learning bidirectional information by creating reverse playback videos for fully supervised temporal action localization. One approach involves creating training data by combining reverse playback videos and forward playback videos, while the other approach involves training separate models on videos with different playback directions. Experiments were conducted on the THUMOS-14 dataset using TALLFormer. When using both reverse and forward playback videos as training data, the performance was 5.1% lower than that of the existing method. On the other hand, using a model ensemble shows a 1.9% improvement in performance.

Two-Stream Convolutional Neural Network for Video Action Recognition

  • Qiao, Han;Liu, Shuang;Xu, Qingzhen;Liu, Shouqiang;Yang, Wanggan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3668-3684
    • /
    • 2021
  • Video action recognition is widely used in video surveillance, behavior detection, human-computer interaction, medically assisted diagnosis and motion analysis. However, video action recognition can be disturbed by many factors, such as background, illumination and so on. Two-stream convolutional neural network uses the video spatial and temporal models to train separately, and performs fusion at the output end. The multi segment Two-Stream convolutional neural network model trains temporal and spatial information from the video to extract their feature and fuse them, then determine the category of video action. Google Xception model and the transfer learning is adopted in this paper, and the Xception model which trained on ImageNet is used as the initial weight. It greatly overcomes the problem of model underfitting caused by insufficient video behavior dataset, and it can effectively reduce the influence of various factors in the video. This way also greatly improves the accuracy and reduces the training time. What's more, to make up for the shortage of dataset, the kinetics400 dataset was used for pre-training, which greatly improved the accuracy of the model. In this applied research, through continuous efforts, the expected goal is basically achieved, and according to the study and research, the design of the original dual-flow model is improved.

An efficient method applied to spike pattern detection

  • Duc, Thang Nguyen;Kim, Tae-Seong;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.558-559
    • /
    • 2007
  • The detection of neural spike activity is a technical challenge that is very important for studying many types of brain function. On temporal recordings of firing events or interspike interval series of neural signal, spike pattern correspond to action will be repeated in the presence of background noise and they need to be detected to develop higher applications. We will introduce new method to find these patterns in raw multitrial data and is tested on surrogate data sets with the main target to get meaningful analysis of electrophysiological data from microelectrode arrays (MEA).

  • PDF

A Dangerous Situation Recognition System Using Human Behavior Analysis (인간 행동 분석을 이용한 위험 상황 인식 시스템 구현)

  • Park, Jun-Tae;Han, Kyu-Phil;Park, Yang-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.3
    • /
    • pp.345-354
    • /
    • 2021
  • Recently, deep learning-based image recognition systems have been adopted to various surveillance environments, but most of them are still picture-type object recognition methods, which are insufficient for the long term temporal analysis and high-dimensional situation management. Therefore, we propose a method recognizing the specific dangerous situation generated by human in real-time, and utilizing deep learning-based object analysis techniques. The proposed method uses deep learning-based object detection and tracking algorithms in order to recognize the situations such as 'trespassing', 'loitering', and so on. In addition, human's joint pose data are extracted and analyzed for the emergent awareness function such as 'falling down' to notify not only in the security but also in the emergency environmental utilizations.

Automatic False-Alarm Labeling for Sensor Data

  • Adi, Taufik Nur;Bae, Hyerim;Wahid, Nur Ahmad
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.139-147
    • /
    • 2019
  • A false alarm, which is an incorrect report of an emergency, could trigger an unnecessary action. The predictive maintenance framework developed in our previous work has a feature whereby a machine alarm is triggered based on sensor data evaluation. The sensor data evaluator performs three essential evaluation steps. First, it evaluates each sensor data value based on its threshold (lower and upper bound) and labels the data value as "alarm" when the threshold is exceeded. Second, it calculates the duration of the occurrence of the alarm. Finally, in the third step, a domain expert is required to assess the results from the previous two steps and to determine, thereby, whether the alarm is true or false. There are drawbacks of the current evaluation method. It suffers from a high false-alarm ratio, and moreover, given the vast amount of sensor data to be assessed by the domain expert, the process of evaluation is prolonged and inefficient. In this paper, we propose a method for automatic false-alarm labeling that mimics how the domain expert determines false alarms. The domain expert determines false alarms by evaluating two critical factors, specifically the duration of alarm occurrence and identification of anomalies before or while the alarm occurs. In our proposed method, Hierarchical Temporal Memory (HTM) is utilized to detect anomalies. It is an unsupervised approach that is suitable to our main data characteristic, which is the lack of an example of the normal form of sensor data. The result shows that the technique is effective for automatic labeling of false alarms in sensor data.

Reliable Smoke Detection using Static and Dynamic Textures of Smoke Images (연기 영상의 정적 및 동적 텍스처를 이용한 강인한 연기 검출)

  • Kim, Jae-Min
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.2
    • /
    • pp.10-18
    • /
    • 2012
  • Automatic smoke detection systems using a surveillance camera requires a reliable smoke detection method. When an image sequence is captured from smoke spreading over in the air, not only has each smoke image frame a special texture, called static texture, but the difference between two smoke image frames also has a peculiar texture, called dynamic texture. Even though an object has a static texture similar to that of the smoke, its dynamic texture cannot be similar to that of the smoke if its movement differs from the diffraction action of the smoke. This paper presents a reliable smoke detection method using these two textures. The proposed method first detects change regions using accumulated frame difference, and then picks out smoke regions using Haralick features extracted from two textures.

PCA­based Waveform Classification of Rabbit Retinal Ganglion Cell Activity (주성분분석을 이용한 토끼 망막 신경절세포의 활동전위 파형 분류)

  • 진계환;조현숙;이태수;구용숙
    • Progress in Medical Physics
    • /
    • v.14 no.4
    • /
    • pp.211-217
    • /
    • 2003
  • The Principal component analysis (PCA) is a well-known data analysis method that is useful in linear feature extraction and data compression. The PCA is a linear transformation that applies an orthogonal rotation to the original data, so as to maximize the retained variance. PCA is a classical technique for obtaining an optimal overall mapping of linearly dependent patterns of correlation between variables (e.g. neurons). PCA provides, in the mean-squared error sense, an optimal linear mapping of the signals which are spread across a group of variables. These signals are concentrated into the first few components, while the noise, i.e. variance which is uncorrelated across variables, is sequestered in the remaining components. PCA has been used extensively to resolve temporal patterns in neurophysiological recordings. Because the retinal signal is stochastic process, PCA can be used to identify the retinal spikes. With excised rabbit eye, retina was isolated. A piece of retina was attached with the ganglion cell side to the surface of the microelectrode array (MEA). The MEA consisted of glass plate with 60 substrate integrated and insulated golden connection lanes terminating in an 8${\times}$8 array (spacing 200 $\mu$m, electrode diameter 30 $\mu$m) in the center of the plate. The MEA 60 system was used for the recording of retinal ganglion cell activity. The action potentials of each channel were sorted by off­line analysis tool. Spikes were detected with a threshold criterion and sorted according to their principal component composition. The first (PC1) and second principal component values (PC2) were calculated using all the waveforms of the each channel and all n time points in the waveform, where several clusters could be separated clearly in two dimension. We verified that PCA-based waveform detection was effective as an initial approach for spike sorting method.

  • PDF