VGG-Sound

by Honglie ChenCC-BY

VGG-Sound

VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube. VGG-Sound contains audios spanning a large number of challenging acoustic environments and noise characteristics of real applications. All videos are captured "in the wild" with audio-visual correspondence in the sense that the sound source is visually evident. VGG-Sound consists of both audio and video. Each segment is 10 seconds long.

Dataset Attributes