Nicolas Gorlo · December 1, 2025

Preprint Released

I’m excited to share that we just released Describe Anything, Anywhere, at Any Moment. Stay tuned for updates!

Key Contributions

DAAAM: A novel real-time approach to create a 4D scene graph as explicit large-scale spatio-temporal memory with highly detailed annotations
Optimization-based annotation: Efficiently annotate entities with large localized captioning models online in batch
State-of-the-art results on spatio-temporal question answering and sequential task grounding, plus a new extended benchmark (data and code open-source)

Results

State-of-the-art on spatio-temporal question answering (OC-NaVQA) and sequential task grounding (SG3D):

OC-NaVQA: +53.6% question accuracy, -21.9% position error, -21.6% temporal error
SG3D: +27.8% task grounding accuracy
Real-time at 10Hz, scalable to 35+ min sequences and 1.5km distance

Check out the project page for more details, including links to the paper and code.