Conference Paper


Video-based crowd counting using a multi-scale optical flow pyramid network

Abstract

This paper presents a novel approach to the task of video-based crowd counting, which can be formalized as the regression problem of learning a mapping from an input image to an output crowd density map. Convolutional neural networks (CNNs) have demonstrated striking accuracy gains in a range of computer vision tasks, including crowd counting. However, the dominant focus within the crowd counting literature has been on the single-frame case or applying CNNs to videos in a frame-by-frame fashion without leveraging motion information. This paper proposes a novel architecture that exploits the spatiotemporal information captured in a video stream by combining an optical flow pyramid with an appearance-based CNN. Extensive empirical evaluation on five public datasets comparing against numerous state-of-the-art approaches demonstrates the efficacy of the proposed architecture, with our methods reporting best results on all datasets. Finally, a set of transfer learning experiments shows that, once the proposed model is trained on one dataset, it can be transferred to another using a limited number of training examples and still exhibit high accuracy

Attached files

Authors

Hossain, Mohammad Asiful
Cannons, Kevin
Jang, Daesik
Cuzzolin, Fabio
Xu Zhan

Oxford Brookes departments

School of Engineering, Computing and Mathematics

Dates

Year of publication: 2021
Date of RADAR deposit: 2021-01-13


Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License


Related resources

This RADAR resource is Identical to Video-based crowd counting using a multi-scale optical flow pyramid network
This RADAR resource is the Accepted Manuscript of Video-based crowd counting using a multi-scale optical flow pyramid network

Details

  • Owner: Joseph Ripp
  • Collection: Outputs
  • Version: 1 (show all)
  • Status: Live
  • Views (since Sept 2022): 575