AVA Video Action Dataset

by Chunhui Gu,Chen Sun,David A. Ross,Carl Vondrick,Caroline Pantofaru,Yeqing Li,Sudheendra Vijayanarasimhan,George Toderici,Susanna Ricco,Rahul Sukthankar,Cordelia Schmid,Jitendra MalikUnknown

AVA Video Action Dataset

The AVA dataset densely annotates 80 atomic visual actions in 57.6k movie clips with actions localized in space and time, resulting in 210k action labels with multiple labels per human occurring frequently. The main differences with existing video datasets are: (1) the definition of atomic visual actions, which avoids collecting data for each and every complex action; (2) precise spatio-temporal annotations with possibly multiple annotations for each human; (3) the use of diverse, realistic video material (movies). Our goal is to accelerate research on video action recognition. More details about the dataset and initial experiments can be found in our arXiv paper.

Dataset Attributes

Label SVG
TasksDetection
Label SVG
CategoriesActions, Humans, Activities
Label SVG
SensorRGB Camera

Class Labels

carry/holdtouchrideanswer phoneeatsmokereaddrinkwatchplay musical instrumentdriveopenwritecloselistensail boatput downlift/pick uptext on/look at a cellphonepushpulldress/put on clothingthrowclimbwork on a computerentershoothittake a photocutturnplay with petspoint toplay board gamepresscatchfishingcookpaintshovelrow boatdigstirclink glassexitchopkickbrush teethextract