Variational auto-encoder

The one with the intuitive name.

Getting a computer to understand what’s going on in an image is challenging. Consider an image made of 40 pixels by 40 pixels. The image has 1600 total pixels, so to the computer it’s essentially an array with 1600 dimensions. If each pixel can take an integer value between 1 and 255, then that means there are 1600²⁵⁵ = (1 with 817 zeros) possible images that the computer has to worry about. Humans can recognize what’s going on in most of those images because many of them are only imperceptibly different. But a computer has to worry about each individual one.

Thankfully, there are methods for approach this kind of problem. We can use convolutions, which are a type of filter that can allow the computer to associate nearby pixels with each other. We can also use generative deep learning techniques to allow the computer to gradually build an understanding. A variational auto-encoder (VAE) utilizes these two approaches simultaneously. Here is an example where the VAE is learning to classify handwritten numbers.

Notice how in the third panel, the different colors, representing different numbers, are grouped together. That is what we want!

In the language of statistics, a VAE is a variational Bayesian approach to inferring a numerically intractable posterior distribution. The numerically intractable distribution is the distribution over the 1600 dimensions discussed above while the inferred posterior is the simple 2 dimensional distribution shown in the third panel.

Handwritten numbers are kind of cool, but I’m an astronomer *checks to make sure* so I want to apply this technique to something from outer space. I study black holes, but we don’t have that many pictures of black holes, so that’s sort of challenging. What we do have a lot of are pictures of galaxies with supermassive black holes that are currently growing. So the question is if we apply this technique to those galaxies, as well as to some where the black holes are not growing, will the VAE find any kind of difference between them, like it did for the 1s and the 0s above. If so, what is causing the differences?

The movie below shows a VAE training on a sample of galaxies. The image in the top left is one of the galaxies, and the bottom left is the reconstruction made by the VAE. The 18 images on the right show the different dimensions of the latent space. You can think of the reconstruction as some sort of combination of these 18 images, weighted by the color of the borders (black means that dimension is more important).

As the VAE trains (as the epoch number increases), the reconstruction gets better and the dimensions get sharper. So that’s cool, but what do we learn from it?

Basically, we search the latent space for somewhere which is dominated by the galaxies we are interested in (QSOs, orange borders). This is shown below on the right. On the left is a corresponding region of uninteresting galaxies (SIM-QSO, red borders).

The relevant questions are a) are there a bunch of orange borders on the right (yes), and b) is there a difference in structure between the left and the right (yes). This means that galaxies with growing black holes in them tend to have the structures seen on the right. What these are or the connection to black holes is still unknown, but its kind of neat!