3D scene understanding is of importance since it is a reflection about the real-world scenario.
The goal of our work is to complete the 3d semantic scene from an RGB-D image. The state-ofthe-art methods have poor accuracy in the face of complex scenes. In addition, other existing
3D reconstruction methods use depth as the sole input, which causes performance bottlenecks.
We introduce a two-stream approach that uses RGB and depth as input channels to a novel
GAN architecture to solve this problem. Our method demonstrates excellent performance on
both synthetic SUNCG and real NYU dataset. Compared with the latest method SSCNet, we
achieve 4.3% gains in Scene Completion (SC) and 2.5% gains in Semantic Scene Completion
(SSC) on NYU dataset.