Conference Paper

Spatiotemporal deformable scene graphs for complex activity detection


Long-term complex activity recognition and localisation can be crucial for decision making in autonomous systems such as smart cars and surgical robots. Here we address the problem via a novel deformable, spatiotemporal scene graph approach, consisting of three main building blocks: (i) action tube detection, (ii) the modelling of the deformable geometry of parts, and (iii) a graph convolutional network. Firstly, action tubes are detected in a series of snippets. Next, a new 3D deformable RoI pooling layer is designed for learning the flexible, deformable geometry of the constituent action tubes. Finally, a scene graph is constructed by considering all parts as nodes and connecting them based on different semantics such as order of appearance, sharing the same action label and feature similarity. We also contribute fresh temporal complex activity annotation for the recently released ROAD autonomous driving and SARAS-ESAD surgical action datasets and show the adaptability of our framework to different domains. Our method is shown to significantly outperform graph-based competitors on both augmented datasets.

Attached files


Khan, S
Cuzzolin, Fabio

Oxford Brookes departments

School of Engineering, Computing and Mathematics


Year of publication: Not yet published.
Date of RADAR deposit: 2021-10-26

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Related resources

This RADAR resource is the Accepted Manuscript of Spatiotemporal Deformable Scene Graphs for Complex Activity Detection


  • Owner: Joseph Ripp
  • Collection: Outputs
  • Version: 1 (show all)
  • Status: Live
  • Views (since Sept 2022): 422