Single Image Depth Prediction Made Better:
A Multivariate Gaussian Take
CVL ETH Zürich^{1}, UESTC China^{2}, University of Würzburg^{3}, KU Lueven^{4}

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.


Abstract
Neuralnetworkbased single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's perpixel depth at test time. Since the problem, by definition, is illposed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing stateoftheart learning techniques predict a single scalar depth value perpixel. Yet, it is wellknown that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of perpixel depth, where we can predict and reason about the perpixel depth and its distribution. To this end, we model perpixel scene depth using a multivariate Gaussian distribution.
Moreover, contrary to the existing uncertainty modeling methodsin the same spirit, where perpixel depth is assumed to be independent, we introduce perpixel covariance modeling that encodes its depth dependency w.r.t. all the scene points. Unfortunately, perpixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned lowrank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUNRGBD, the SIDP model obtained by optimizing our loss function shows stateoftheart results. Our method's accuracy (named MG) is among the top on the KITTI depthprediction benchmark leaderboard.
Paper

Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc Van Gool.
CVPR 2023, Vancouver, Canada.

Poster
Qualitative Results
The model is trained on NYU Depth V2, and evaluated on SUN RGBD without finetuning.
Authors
Ce Liu

Suryansh Kumar

Shuhang Gu

Radu Timofte

Luc Van Gool

Acknowledgements
This work was partly supported by ETH General Fund (OK), Chinese Scholarship Council (CSC), and The Alexander von Humboldt Foundation.
