CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

by No License

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

A synthetic video understanding benchmark, with tasks that by-design require temporal reasoning to be solved (Girdhar, Ramanan)

Dataset Attributes