by Shuran Song,Samuel P. Lichtenberg,Jianxiong XiaoUnknown
by Shuran Song,Samuel P. Lichtenberg,Jianxiong XiaoLicense : Unknown
Although RGB-D sensors have enabled major breakthroughs for several vision tasks, such as 3D reconstruction, we haven not achieved a similar performance jump for high-level scene understanding. Perhaps one of the main reasons for this is the lack of a benchmark of reasonable size with 3D annotations for training and 3D metrics for evaluation. In this paper, we present an RGB-D benchmark suite for the goal of advancing the state-of-the-art in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,000 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 58,657 3D bounding boxes with accurate object orientations, as well as a 3D room layout and category for scenes. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using direct and meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
CategoriesRoom, Objects, Computers
toiletbedtablechairsofasinkpillownight standmonitorlampgarbage bindresserdoordesk