SpatialClaw: SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

“SpatialClaw is a training-free spatial reasoning framework that treats code as the action interface: a VLM-backed agent writes one Python cell per step into a persistent Jupyter kernel pre-loaded with perception primitives (SAM3 segmentation, Depth-Anything-3 reconstruction, geometry utilities) and scientific libraries (NumPy, SciPy, Matplotlib). Each cell can compose tool outputs, inspect intermediate evidence, and revise the analysis before committing an answer with ReturnAnswer(…). Across 20 spatial reasoning benchmarks, SpatialClaw reaches 59.9% average accuracy, outperforming the prior best spatial agent by +11.2 points — with the same system prompt, tool set, and hyperparameters across all benchmarks and six VLM backbones (Qwen3.5/3.6, Gemma4) from 26B to 397B parameters…”

Request a Quote

Log In

SpatialClaw: SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

SpatialClaw: SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

GitHub – NVlabs/SpatialClaw: SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning