IMU-Assisted Learning of Single-View Rolling Shutter Correction | Minnesota Interactive Robotics and Vision Laboratory

Abstract

Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach.

Methodology

An overview of the image processing pipeline. [Red: inputs for the neural network; Blue: ground truth for the neural network.]

An overview of the proposed system. Given an RS image, the pixel-wise depth is generated by RsDepthNet; the row-wise poses are predicted by RsPoseNet using the RS image and IMU data. The pose estimates and depth maps are subsequently used for geometric projection to recover the corresponding GS image.

The architecture of RsPoseNet: a feature extraction network (ResNet-34) is followed by PoseConv blocks that learn row-wise poses, which is refined by IMU data using a LSTM network.

Results

Data Generation Verification

Top-to-bottom, left-to-right: Scenes reconstructed by DSO on RS images (RS), GS1 images (GS1 gt), GS1 images with constant velocity assumption (GS1 gt_cv), and GS1 images corrected by our network (GS1 pred).

Network Evaluation

The ratio of images whose RS distortion is reduced and EPE (in pixel) on the test data.

A few samples of predicted GS images. From top to bottom are: input RS images, the ground truth GS1 images, the GS images corrected by our network, and the GS images corrected by TwoView. The last row contains new undesired distortion.

Links

Paper(CoRL21)

Code