SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac

Ankur Handa

Stefan Leutenegger

Andrew J. Davison

Dyson Robotics Lab at Imperial College, Department of Computing, Imperial Collge London

Example renders sampled from the dataset. We are able to render large variety of such scenes with objects sampled from ShapeNets and layouts from SceneNet.

Abstract

We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.

Paper

arXiv 2016

ICCV 2017

Overview

Objects are placed randomly in the scene by dropping them from ceiling and letting them settle down with a physics engine. We used Chrono physics engine to simulate this scenario. Once stable configuration has been generated, we use the poses of the object and render them with NVIDIA OptiX ray tracer.

Dataset

Training Set [263GB] Training Set Protobuf [323MB]

Validation Set [15GB] Validation Set Protobuf [31MB]

Caveat: Untar-ing can take some time since there are lots of subdirectories.

Training dataset is also split into 17 tarballs

train_0 [16GB]
train_1 [16GB]
train_2 [16GB]
train_3 [16GB]
train_4 [16GB]
train_5 [16GB]
train_6 [16GB]
train_7 [16GB]
train_8 [16GB]
train_9 [16GB]
train_10 [16GB]
train_11 [16GB]
train_12 [16GB]
train_13 [16GB]
train_14 [16GB]
train_15 [16GB]
train_16 [16GB]

Code to parse the dataset

https://github.com/jmccormac/pySceneNetRGBD

SUN RGB-D (Alternative links that provide cleaned up dataset)

https://github.com/ankurhanda/sunrgbd-meta-data

NYUv2 (Alternative links that provide cleaned up dataset)

https://github.com/ankurhanda/nyuv2-meta-data

Floor plans from SceneNet

https://drive.google.com/open?id=0B_CLZMBI0zcuRmM4cDIzdUtSdUU

Citation

		@article{McCormac:etal:ICCV2017,
		author = {John McCormac and
			  Ankur Handa and
		          Stefan Leutenegger and
		          Andrew J.Davison},
		title = {SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?},
  		booktitle = {ICCV},
		year = {2017}
		}

		@article{McCormac:etal:arXiv2016,
		author = {John McCormac and
			  Ankur Handa and
		          Stefan Leutenegger and
		          Andrew J.Davison},
		title = {SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth },
  		booktitle = {arXiv},
		year = {2016}
		}

Code for Generating Scenes and Rendering

Download code for Rendering and Generation

Acknowledgements

Research presented in this paper has been supported by Dyson Technology Ltd. We would also like to thank Patrick Bardow for providing optical flow code and Phillip Isola for their neat website template that hosts pix2pix which we modified.

License

GPL.