by Honglie ChenCC-BY
by Honglie ChenLicense : CC-BY
VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube.
VGG-Sound contains audios spanning a large number of challenging acoustic environments and noise characteristics of real applications. All videos are captured "in the wild" with audio-visual correspondence in the sense that the sound source is visually evident. VGG-Sound consists of both audio and video. Each segment is 10 seconds long.