Enhancing Underwater Imagery using Generative Adversarial Networks


Project Lead: Cameron Fabbri

Bib entry:

@inproceedings{Fabbri2018ICRA, author = {Cameron Fabbri and Md Jahidul Islam and Junaed Sattar},
title = {{Enhancing Underwater Imagery using Generative Adversarial Networks}},
booktitle = {Proceedings of the {IEEE International Conference on Robotics and Automation (ICRA)}, to appear},
year = {2018},
address = {Brisbane, Queensland, Australia},
month = {May}}


*Dataset can be downloaded here*

*A pretrained model can be downloaded here*


Autonomous underwater vehicles (AUVs) rely on a variety of sensors – acoustic, inertial and visual – for intelligent decision making. Due to its non-intrusive, passive nature, and high information content, vision is an attractive sensing modality, particularly at shallower depths. However, factors
such as light refraction and absorption, suspended particles in the water, and color distortion effect the quality of visual data, resulting in noisy and distorted images. AUVs that rely on visual sensing thus face difficult challenges, and consequently exhibit poor performance on vision-driven tasks. This paper proposes a method to improve the quality of visual underwater scenes using Generative Adversarial Networks (GANs), with the goal of improving input to vision-driven behaviors further down the autonomy pipeline. Furthermore, we show how recently proposed methods are able to generate a dataset for the purpose of such underwater image restoration. For any visually-guided underwater robots, this improvement can result in increased safety and reliability through robust visual perception. To that effect, we present quantitative and qualitative data which demonstrates that images corrected through the proposed approach generate more visually appealing images, and also provide increased accuracy for a diver tracking algorithm.


Issue of Ground Truth

For the problem of automatically colorizing grayscale images, paired training data is readily available due to the fact that any color image can be converted to black and white. However, underwater images distorted by either color or some other phenomenon lack ground truth, which is a major hinderance towards adopting a similar approach.


Dataset Construction

To solve the issue of acquiring training data, we use the recently proposed CycleGAN [1] in
order to generate a dataset of image pairs. CycleGAN learns a mapping G: X → Y such that the
images sampled from G(X) are indistinguishable from the images sampled from Y, as well as a
mapping F: Y → X such that the images sampled from F(Y) are indistinguishable from the
images sampled from X.

We construct two domains, X and Y, where X consists of distorted underwater images taken from
ImageNet [6], and Y consists of non-distorted images taken from selected subsets from Imagenet [6]. CycleGAN is then used to generate a set of image pairs such that they appear to be underwater, as shown below. Top row are original images, and bottom row generated by CycleGAN.


Loss Function

We use Generative Adversarial Networks (GANS) [8] as the base to our objective function. GANs represent a class of generative models based on game theory in which a generator network competes against an adversary. The generator G produces instances which actively try and fool the discriminator network D. The goal is for the discriminator network to distinguist between true instances coming from the dataset and instances generated by G. We adopet the Improved WGAN formulation (WGAN-GP) [8] for our problem to deal with practical issues of training GANs. Our objective using this formulation becomes




We compare statistics on local image patches as one source of evaluation, as the original image has no "ground truth" per say. 






[1] Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." arXiv preprint arXiv:1703.10593 (2017).
[2] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
[3] Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." arXiv preprint arXiv:1511.05440 (2015).
[4] Pathak, Deepak, et al. "Context encoders: Feature learning by inpainting." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[5] Zhang, Richard, Phillip Isola, and Alexei A. Efros. "Colorful image colorization." European Conference on Computer Vision. Springer International Publishing, 2016.
[6] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

[7] Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint arXiv:1611.07004 (2016).

[8] Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." arXiv preprint arXiv:1704.00028 (2017).

[9] Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).