RAVEN

by Chi Zhang⋆,1,2 Feng Gao⋆,1,2 Baoxiong Jia1 Yixin Zhu1,2 Song-Chun Zhu1,2 1 UCLA Center for Vision,Cognition,Learning and Autonomy 2 International Center for AI and Robot Autonomy (CARA)Research Only

RAVEN

We propose a new visual reasoning dataset, called RAVEN (Relational and Analogical Visual rEasoNing), in the context of Raven's Progressive Matrices (RPM). Unlike previous works, RAVEN is aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. This allows us to establish a semantic link between vision and reasoning by providing structure representation. We measure human performance in this dataset, benchmark several other baseline models, and propose a simple neural module (Dynamic Residual Tree, or DRT) that combines visual understanding and structural reasoning. Comprehensive experiments show that incorporating structural information consistently improves model performance.

Dataset Attributes

Label SVG
TasksProblem Matrix (fill the grid)
Label SVG
CategoriesVisualreasoning, Reasoning, Nlp, Parsing
Label SVG
SensorRGB Camera