Preprint Released
I’m excited to share that we just released Describe Anything, Anywhere, at Any Moment. Stay tuned for updates!
Key Contributions
- DAAAM: A novel real-time approach to create a 4D scene graph as explicit large-scale spatio-temporal memory with highly detailed annotations
- Optimization-based annotation: Efficiently annotate entities with large localized captioning models online in batch
- State-of-the-art results on spatio-temporal question answering and sequential task grounding, plus a new extended benchmark (data and code open-source)
Results
State-of-the-art on spatio-temporal question answering (OC-NaVQA) and sequential task grounding (SG3D):
- OC-NaVQA: +53.6% question accuracy, -21.9% position error, -21.6% temporal error
- SG3D: +27.8% task grounding accuracy
- Real-time at 10Hz, scalable to 35+ min sequences and 1.5km distance
Check out the project page for more details, including links to the paper and code.