Journal Article

Spatio-temporal action instance segmentation and localisation


Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. In this work we address the problem of action localisation and instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. We cast the action tube extraction as an energy maximisation problem in which configurations of region proposals in each frame are assigned a cost and the best action tubes are selected via two passes of dynamic programming. One pass associates region proposals in space and time for each action category, and another pass is used to solve for the tube’s temporal extent and to enforce a smooth label sequence through the video. In addition, by taking advantage of recent work on action foreground-background segmentation, we are able to associate each tube with class-specific segmentations. We demonstrate the performance of our algorithm on the challenging LIRIS-HARL dataset and achieve a new state-of-the-art result which is 14.3 times better than previous methods.

The fulltext files of this resource are currently embargoed.
Embargo end: 2022-07-10


Saha, Suman
Singh, Gurkirt
Sapienza, Michael
Torr, Philip H.S.
Cuzzolin, Fabio

Oxford Brookes departments

School of Engineering, Computing and Mathematics


Year of publication: 2020
Date of RADAR deposit: 2021-01-12

“Copyright © 2020. Users may view, print, copy, download and text and data-mine the content, for the purposes of academic research, subject always to the full conditions of use. Any further use is subject to permission from Springer Nature.”

Related resources

This RADAR resource is the Accepted Manuscript of Spatio-temporal action instance segmentation and localisation
This RADAR resource is Part of Modelling human motion: From human perception to robot design [ISBN: 9783030467319] / edited by Nicoletta Noceti, Alessandra Sciutti, Francesco Rea (Springer, 2020).


  • Owner: Joseph Ripp
  • Collection: Outputs
  • Version: 1 (show all)
  • Status: Live