View : 539 Download: 0

Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset

Title
Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset
Authors
Cho, JaehoonMin, DongboKim, YoungjungSohn, Kwanghoon
Ewha Authors
민동보
SCOPUS Author ID
민동보scopus
Issue Date
2021
Journal Title
EXPERT SYSTEMS WITH APPLICATIONS
ISSN
0957-4174JCR Link

1873-6793JCR Link
Citation
EXPERT SYSTEMS WITH APPLICATIONS vol. 178
Keywords
Monocular depth estimationConvolutional neural networkStudent-teacher strategyOutdoor stereo datasetStereo confidence maps
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Indexed
SCIE; SCOPUS WOS scopus
Document Type
Article
Abstract
Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during the training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo image pairs. The study aims to propose a student-teacher strategy in which a shallow student network is trained with the auxiliary information obtained from a deeper and more accurate teacher network. Specifically, we first train the stereo teacher network by fully utilizing the binocular perception of 3-D geometry, and then use the depth predictions of the teacher network to train the student network for monocular depth inference. This enables us to exploit all available depth data from massive unlabeled stereo pairs. We propose a strategy that involves the use of a data ensemble to merge the multiple depth predictions of the teacher network to improve the training samples by collecting nontrivial knowledge beyond a single prediction. To refine the inaccurate depth estimation that is used when training the student network, we further propose stereo confidence guided regression loss that handles the unreliable pseudo depth values in occlusion, texture-less region, and repetitive pattern. To complement the existing dataset comprising outdoor driving scenes, we built a novel large-scale dataset consisting of one million outdoor stereo images taken using hand-held stereo cameras. Finally, we demonstrate that the monocular depth estimation network provides feature representations that are suitable for high-level vision tasks. The experimental results for various outdoor scenarios demonstrate the effectiveness and flexibility of our approach, which outperforms state-of-the-art approaches.
DOI
10.1016/j.eswa.2021.114877
Appears in Collections:
인공지능대학 > 컴퓨터공학과 > Journal papers
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

BROWSE