SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

John McCormac
Ankur Handa
Stefan Leutenegger
Andrew J. Davison

Dyson Robotics Lab at Imperial College, Department of Computing, Imperial Collge London

Example renders sampled from the dataset. We are able to render large variety of such scenes with objects sampled from ShapeNets and layouts from SceneNet.


We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.




Training Set [263GB]    Training Set Protobuf [323MB]

Validation Set [15GB]     Validation Set Protobuf [31MB]

Caveat: Untar-ing can take some time since there are lots of subdirectories.

Training dataset is also split into 17 tarballs

train_0 [16GB]
train_1 [16GB]
train_2 [16GB]
train_3 [16GB]
train_4 [16GB]
train_5 [16GB]
train_6 [16GB]
train_7 [16GB]
train_8 [16GB]
train_9 [16GB]
train_10 [16GB]
train_11 [16GB]
train_12 [16GB]
train_13 [16GB]
train_14 [16GB]
train_15 [16GB]
train_16 [16GB]

Code to parse the dataset

SUN RGB-D (Alternative links that provide cleaned up dataset)

NYUv2 (Alternative links that provide cleaned up dataset)

Floor plans from SceneNet


Research presented in this paper has been supported by Dyson Technology Ltd. We would also like to thank Patrick Bardow for providing optical flow code and Phillip Isola for their neat website template that hosts pix2pix which we modified.