Visual Lip Reading Feasibility (VRLF)
by No License
Visual Lip Reading Feasibility (VRLF)
by Unknown License
The VLRF database is designed with the aim to contribute to research in visual only speech recognition. A key difference of the VLRF database with respect to existing corpora is that it has been designed from a novel point of view: instead of trying to lip-read from people who are speaking naturally (normal speed, normal intonation,...), we propose to lip-read from people who strive to be understood.We recruited 24 adult volunteers (3 male and 21 female). Each participant was asked to read 25 different sentences, from a total pool of 500 sentences. Each sentence contains between 3 and 12 words, with an average duration of 7 seconds per sentence and a total database duration of 180 minutes (540,162 frames). The sentences were unrelated between them to avoid that lip-readers could benefit from conversation context. The camera recorded a close-up shot at 50 fps with a resolution of 1280x720 pixels and audio at 48 kHz mono with 16-bit resolution.The database is freely available for research purposes. It includes the following: a) the audio-visual recordings; b) the text of the uttered sentences; c) the phonetic transcription of the uttered sentences. To obtain a copy of the database, please download the License Agreement listed below and send a signed copy to the following e-mail: vlrf.database@upf.edu (vlrf dot database at upf dot edu).For additional information, please refer to the following publication:A. Fernandez-Lopez, O. Martinez and F.M. Sukno. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, 2017.