preview

Is The Architectures Of The Convnets ) For Action Recognition In Video?

Better Essays

We investigate architectures of discriminatively trained deep Convolutional Net- works (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion be- tween frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet architec- ture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi- task learning, applied to two different action classification …show more content…

The spatial stream performs action recognition from still video frames, whilst the temporal stream is trained to recognise action from motion in the form of dense optical flow. Both streams are implemented as ConvNets. Decoupling the spatial and temporal nets also allows us to exploit the availability of large amounts of annotated image data by pre-training the spatial net on the ImageNet challenge dataset [1]. Our proposed architecture is related to the two-streams hypothesis [9], according to which the human visual cortex contains two pathways: the ventral stream (which performs object recognition) and the dorsal stream (which recognises motion); though we do not investigate this connection any further here.
The rest of the paper is organised as follows. In Sect. 1.1 we review the related work on action recognition using both shallow and deep architectures. In Sect. 2 we introduce the two-stream architecture and specify the Spatial ConvNet. Sect. 3 introduces the Temporal ConvNet and in particular how it generalizes the previous architectures reviewed in Sect. 1.1. A mult-task learning framework is developed in Sect. 4 in order to allow effortless combination of training data over 1 arXiv:1406.2199v2 [cs.CV] 12 Nov 2014 multiple datasets. Implementation details are given in Sect. 5, and the performance is evaluated in Sect. 6 and compared to the state of the art. Our experiments on two challenging datasets (UCF- 101 [24] and HMDB-51 [16])

Get Access