Extending Monocular Visual Odometry to Stereo Camera Systems by Scale Optimization




This paper proposes a novel approach for extending monocular visual odometry to a stereo camera system. The proposed method uses an additional camera to accurately estimate and optimize the scale of the monocular visual odometry, rather than triangulating 3D points from stereo matching. Specifically, the 3D points generated by the monocular visual odometry are projected onto the other camera of the stereo pair, and the scale is recovered and optimized by directly minimizing the photometric error. It is computationally efficient, adding minimal overhead to the stereo vision system compared to straightforward stereo matching, and is robust to repetitive texture. Additionally, direct scale optimization enables stereo visual odometry to be purely based on the direct method. Extensive evaluation on public datasets (e.g., KITTI), and outdoor environments (both terrestrial and underwater) demonstrates the accuracy and efficiency of a stereo visual odometry approach extended by scale optimization, and its robustness in environments with challenging textures.






Experimental Evaluations



Effect of scale optimization on KITTI Seq. 00. Trajectories of ground truth (GT), Stereo DSO, SO-DSO, and monocular DSO are shown



Error and run-time comparisons on the KITTI dataset. For each sequence, the upper line is the result of SO-DSO, and the lower line is for Stereo DSO. t_res is translational RMSE(%); r_rel is rotational RMSE (degree per 100m). Results are averaged over 100m to 800m intervals. S.O. is the run-time of scale optimization; S.M. is the run-time of stereo matching; BA is the bundle adjustment run-time; TPF is the time per frame (not just keyframe); Pts is the number of 3D points in the bundle adjustment.




A demonstration of a stereo VO using the proposed method running on MH01 of EuRoC dataset. Left image shows the trajectory and 3D points. Right image compares the trajectory against ground truth.



Error and run-time comparison on EuRoC. Same notation as in the previous table, except that results are averaged over 10m to 80m intervals.


ZED camera dataset


Robot trajectories (in meters) estimated by the three algorithms in the pool.


Pool dataset


Evaluating VO in a pool environment on an AUV. The width of two swimming lanes combined is about 3.6m.