Fractal City

My Experiment in AI-Assisted Art Generation (VQGAN+CLIP)


Got comments?
Fractal City

1994 gave humanity lots of cool things, but without a doubt, one of my highlights is Greg Egan's book "Permutation City."

Permutation City (Subjective Cosmology #2)
The story of a man with a vision - immortality: for those who can afford it is found in cyberspace. Permutation city is the tale of a ma...

The book is especially relevant now that we started talking about multiverses once again. Once everything becomes digital, how would one distinguish between computer simulations of people and real people themselves? Also, when our consciousness becomes a series of bits, amenable from the outside, how would it feel to have one's entire past and present modified and re-created with or without one's consent?

I won't spoil the book further. I just wanted to mention it because it served as my inspiration while generating the series of images (accordingly called Fractal City) below:

One sees multiple variations of the same primary input turned into pieces of AI art by an algorithm called VQGAN+CLIP. The abbreviation became a hot topic recently among artists and scientists alike. In very simple terms, it uses a variation of the Generative Adversarial Network (GAN) idea that became synonymous with deep-fakes and projects like This Person Does Not Exist. A GAN consists of two networks. One tries to "fool "the other by making better and better representations of an original target

For example, when talking about faces, the generator will start with a series of randomly colored pixels, iteratively refining the output until its counterpart (the discriminator) says, "It's a face!"

How a GAN works | Source: https://github.com/microsoft/GenStudio

Until now, GAN's were limited in what they could generate, based on the initial training data they were fed with. A GAN trained to generate faces, for example, won't be able to imagine a car. That is where CLIP enters the stage.

CLIP: Connecting Text and Images
We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.

Developed by Open AI, CLIP is a neural network model able to infer (I think the word "imagine" sits better here) image feature based on human text. Without any limitations on what the text prompt can be.

There you have it- feeding GLIP with your imaginary text prompt will make it" train" your GAN with the image features that match the teat. The rest is GAN doing its best.

If you want to try it, you can check out this Google Colab notebook.

Similar Reads

The Illustrated VQGAN
VQGAN allows us to generate high-resolution images from text, and has now taken art Twitter by storm. Let me talk about how it works on a conceptual level in this blog post.
Generative Adversarial Networks - The Story So Far
Generative adversarial networks (GANs) have been the go-to state of the art algorithm to image generation in the last few years. In this article, you will learn about the most significant breakthroughs in this field, including BigGAN, StyleGAN, and many more.