Adversarial Image Colorization | Minnesota Interactive Robotics and Vision Laboratory

Given a grayscale image as input, we use an adversarial approach towards automatically colorizing the image. We are currently investigating different variations of Adversarial Networks including the original GAN formulation, Energy-Based GANs (EBGANs), Wasserstein GANs (WGANs), and Least Squares GANs (LSGANs).

Our model is based off of the Pix2Pix model. We use skip connections in order to preserve low level features that are shared between the input and output image. This is commonly known as a "U-Net", and works by concatenating the activations from layers in the encoder to corresponding layers in the decoder. We also combined L1 and L2 losses, which causes the generator to fool the discriminator but also be close to ground truth.

Below we briefly describe the different variants of GANs we are using. More information about them can be found in their respective papers.

DCGANs

Deep Convolutional GANs (DCGANs) provided a base framework for using recent advances in deep learning with GANs. Their general architecture has been adopted by LSGANs, WGANs, and EBGANs.

LSGANs

LSGANs attempt to overcome the problem of the vanishing gradient by using a least squares loss function for the discriminator. The motivation is that even though an image may fool the discriminator and be on the correct side of the decision boundary, it may still be far from the true data distribution.

WGANs

WGANs approximate the Earth Mover or Wasserstein Distance. This provides stable but slow training, and also gives meaningful loss functions that correlate with image quality in comparison to the other GANs.

EBGANs

EBGANs treat the discriminator as an energy function that attributes low energies to the true samples and high energies to the generated samples. The discriminator is modeled as an autoencoder, which is interesting because that means it is possible for it to learn the data distribution on its own.

Results

We experimented with the CelebA dataset. Future work will be trained on Imagenet for a full comparison to other methods.

It is important to note that the images we tested on are not "true" black and white photos. For a fair comparison, below we show our method on true black and white photos for which no color photo exists.