by Martin Kišš,Michal Hradiš,Oldřich KodymUnknown
by Martin Kišš,Michal Hradiš,Oldřich KodymLicense : Unknown
Brno Mobile OCR Dataset (B-MOD) is a collection of 2 113 templates (pages of scientific papers). Those templates were captured using 23 various mobile devices under unrestricted conditions ensuring that the obtained photographs contain various amount of blurriness, illumination etc. In total, the dataset contains 19 725 photographs from which more than 500k lines with precise transcriptions was extracted. The templates were divided into three subsets (training, validation and testing). Captured photographs and cropped lines follows this division so photographs of the same templates and lines extracted from them are in the same subset.
TasksOptical Character Recognition
CategoriesReports, Articles, Natural Language Processing, Nlp, Papers
SensorRGB Camera, Web sampling