Syn2Real

by Xingchao Peng,Ben Usman,Kuniaki Saito,Neela Kaushik,Judy Hoffman,Kate SaenkoUnknown

Syn2Real

We introduce Syn2Real, a synthetic-to-real visual domain adaptation benchmark meant to encourage further development of robust domain transfer methods. The goal is to train a model on a synthetic "source" domain and then update it so that its performance improves on a real "target" domain, without using any target annotations. It includes three tasks, illustrated in figures above: the more traditional closed-set classification task with a known set of categories; the less studied open-set classification task with unknown object categories in the target domain; and the object detection task, which involves localizing instances of objects by predicting their bounding boxes and corresponding class labels. The Syn2Real benchmark focuses on unsupervised domain adaptation (UDA) for which the target domain images are not labeled. While there may be scenarios where target labels are available (enabling \textit{supervised} domain adaptation), the purely unsupervised case is more challenging and often more realistic. For each task, we provide a synthetic source domain and two real-image target domains. Many UDA evaluation protocols do not have a validation domain and use labeled target domain data to select hyperparameters. However, assuming labeled target data goes against the UDA problem statement. For this reason, we collect two \textit{different} target domains, one for validation of hyperparameters and one for testing the models. For the closed-set classification task, we generate the largest synthetic-to-real dataset to date with over 280K images in the combined training, validation and test sets. We use this dataset to hold a public challenge, inviting researchers from all over the world to compete on this task, and analyze the performance of the top submitted methods. We find that for this task, where the object categories are known ahead of time, recent UDA models that leverage CNN features pre-trained on ImageNet are able to achieve impressive adaptation results. This is surprising, considering that labels are only available on synthetic source images and the source and target domains are very different. We provide a detailed analysis and insight into these results. We then push the boundaries of synthetic-to-real transfer beyond closed-set classification and design an open-set classification dataset, where the target domains contain images of additional unknown categories that were not present in the source dataset. We evaluate the state-of-the-art UDA methods available for this more difficult task, and find that there is still much room for improvement. Furthermore, we propose an even more challenging object detection benchmark that covers a much more diverse set of object categories than previous syn2real detection datasets, and show that methods that excel for adaptation of classification models completely fail when applied to recent end-to-end detection models, potentially due to very low initial performance of source models on the target data.

Dataset Attributes

Label SVG
TasksClassification
Label SVG
CategoriesHumans, Animals,