SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac
Ankur Handa
Stefan Leutenegger
Andrew J. Davison

Dyson Robotics Lab at Imperial College, Department of Computing, Imperial Collge London


Example renders sampled from the dataset. We are able to render large variety of such scenes with objects sampled from ShapeNets and layouts from SceneNet.

Abstract

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.


Paper



      arXiv 2016



          ICCV 2017

Overview




Objects are placed randomly in the scene by dropping them from ceiling and letting them settle down with a physics engine. We used Chrono physics engine to simulate this scenario. Once stable configuration has been generated, we use the poses of the object and render them with NVIDIA OptiX ray tracer.

Dataset

Training Set [263GB]    Training Set Protobuf [323MB]

Validation Set [15GB]     Validation Set Protobuf [31MB]

Caveat: Untar-ing can take some time since there are lots of subdirectories.


Training dataset is also split into 17 tarballs

train_0 [16GB]
train_1 [16GB]
train_2 [16GB]
train_3 [16GB]
train_4 [16GB]
train_5 [16GB]
train_6 [16GB]
train_7 [16GB]
train_8 [16GB]
train_9 [16GB]
train_10 [16GB]
train_11 [16GB]
train_12 [16GB]
train_13 [16GB]
train_14 [16GB]
train_15 [16GB]
train_16 [16GB]



Code to parse the dataset

https://github.com/jmccormac/pySceneNetRGBD


SUN RGB-D (Alternative links that provide cleaned up dataset)

https://github.com/ankurhanda/sunrgbd-meta-data


NYUv2 (Alternative links that provide cleaned up dataset)

https://github.com/ankurhanda/nyuv2-meta-data


Floor plans from SceneNet

https://drive.google.com/open?id=0B_CLZMBI0zcuRmM4cDIzdUtSdUU


Citation

		@article{McCormac:etal:ICCV2017,
		author = {John McCormac and
			  Ankur Handa and
		          Stefan Leutenegger and
		          Andrew J.Davison},
		title = {SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?},
  		booktitle = {ICCV},
		year = {2017}
		}
		
		@article{McCormac:etal:arXiv2016,
		author = {John McCormac and
			  Ankur Handa and
		          Stefan Leutenegger and
		          Andrew J.Davison},
		title = {SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth },
  		booktitle = {arXiv},
		year = {2016}
		}
		

Code for Generating Scenes and Rendering




Download code for Rendering and Generation


Acknowledgements

Research presented in this paper has been supported by Dyson Technology Ltd. We would also like to thank Patrick Bardow for providing optical flow code and Phillip Isola for their neat website template that hosts pix2pix which we modified.



License

GPL.