access icon free Multi-stream 3D CNN structure for human action recognition trained by limited data

Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two-stream, and four-stream 3D CNNs. In the single-stream model, the authors applied four channels of information from each frame to a single stream. In the two-stream architecture, the authors applied optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In the four-stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four-stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0–4, respectively, using four-stream structure (88.05% recognition rate in average).

Inspec keywords: convolutional neural nets; image sequences; video signal processing; learning (artificial intelligence); object recognition; feature extraction; image classification; cameras; image motion analysis

Other keywords: three-dimensional CNNs; training performance; information channels; multistream 3D CNN structure; action recognition system; four-stream 3D CNNs; single-stream model; recognition rate; two-stream architecture; optical flows; optical flow; convolutional neural network architectures; IXMAS; four-stream structure; vertical directions; human action recognition; training data case; separate streams; four-stream architecture; data set

Subjects: Video signal processing; Knowledge engineering techniques; Image recognition; Neural computing techniques; Computer vision and image processing techniques

http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2018.5088
Loading

Related content

content/journals/10.1049/iet-cvi.2018.5088
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading