Visual Genome

by Ranjay Krishna,Yuke Zhu,Oliver Groth,Justin Johnson,Kenji Hata,Joshua Kravitz,Stephanie Chen,Yannis Kalantidis,Li-Jia Li,David A. Shamma,Michael S. Bernstein,Li Fei-FeiUnknown

Visual Genome

We collect dense annotations of objects, attributes, and relationships within each to learn these models. Specifically, our dataset contains over 100K images where each has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of descriptions, objects, attributes, relationships, and question answers.

Dataset Attributes

Label SVG
TasksClassification
Label SVG
CategoriesHumans, Vehicles, Signs, Nature, Streets
Label SVG
SensorRGB Camera