Audio-Visual Event (AVE) Dataset
by Yapeng TianResearch Only
Audio-Visual Event (AVE) Dataset
by Yapeng TianLicense : Research Only
We introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event (AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization. Audio-Visual Event (AVE) dataset contains 4143 videos covering 28 event categories and videos in AVE are temporally labeled with audio-visual event boundaries.
Dataset Attributes
TasksAudio-guided Visual Attention
CategoriesSounds, Music, Noise
SensorRGB Camera, Audio
Class Labels
Church bellMan speakingdog barkingAirplaneRacing carWoman speakingHelicopterViolinFluteUkeleleFrying foodTruckShofarMotorcycleGuitarTrainClockBanjoGoat Baby cryingBusChainsawCat HorseToilet flushRodentAccordian