At the end of this article, you will find the link to a Real or Fake game. Will you be able to distinguish between human creations and AI generations?
This class does not exist. Yes. You read that correctly. That classroom does not exist. It isn’t a drawing either, nor is it some sort of video game. It was generated in under 30 seconds after I asked an AI to show me a “realistic photograph of a High School classroom, with school desks and chairs. A whiteboard in the back.”
Earlier this year, OpenAI released the second iteration of their scarily good image-generating AI: DALL·E 2. DALL·E 2 is a system that receives a text input (such as “a marble statue of an apple falling from a tree”) and then creates an image based on that input. The key word here is create. The AI does not simply mimic a search engine and find images on the internet of the user’s prompt. DALL·E 2 actually creates the image from scratch, meaning the user will “receive” something that has never existed before.
This younger yet smarter sibling of DALL·E (which was released in 2021) doesn’t just output higher-resolution images, but replaces objects within an image you upload or even expands it. DALL·E 2 is also better at understanding prompts and combining them in images.
Now, you might be wandering, how does it all even work? Officially, it is a complicated matter of contrastive language-image pre-training, paired with a diffusion model to arrive at the final product. Now, unless you are an engineer at NASA (or OpenAI), the definition of “contrastive language-image pre-training” probably doesn’t immediately come to mind. Therefore, I’ll try to explain the process in a more approachable way.
Let’s say you’re a computer scientist that wants to teach a computer to recognize fruits in different images. Before you can submit an image in order for it to decide what fruit it is, it needs to learn, or train its neural network, to know what each fruit looks like. Traditionally, you would gather a bunch of images of apples, for example, and put them all in the “Apples” category. The computer then looks at all of these images and tries to find patterns between them. It might realize, for example, that most of the “Apples” are red, or notice their similar shape. You would do the same thing with bananas. It would look at all the images classified as “Bananas” by you and notice they are all long and yellow.
After training the computer with these images, it would be ready to decide the category of a new image. You decide to submit an image of an apple and ask the computer what category it belongs to. Although the computer has never seen this particular image of an apple before, it recognizes how it is red, and how its shape is more consistent with that of the images in “Apples” than “Bananas.” It therefore decides that the best category for this image is “Apples”. The best part? The computer can now train itself based off of this new image! It can keep improving at recognising the difference between the “Apples” and “Bananas” categories, without the need of any human input.
This is where the cleverness of the DALL·E 2 system really shines. While the computer was being trained, researchers showed it several millions of random images from the internet. The catch? They didn’t classify them with generic categories like “tree” or “keyboard.” They instead showed the computer the original captions for those images. So, if an image was found on a gardening blog, they didn’t intercept it and label it with “Tree,” They simply let it through with its original caption from the blog, which may have been “oak tree in autumn.” The same would happen for other images. Let’s say another image was shown to the computer from the Department of Agriculture, this one captioned “tall maple tree in the summer.” The computer doesn’t really know what the words in the caption actually correlate to in the image, but after being shown many more pictures with the word “tree” in the caption, it may start to realize that most of the images that include tree in the caption all contain a big body of mass with leafy arms. It’s essentially creating a description of what a “tree” looks like. Similarly, the computer is also being shown images captioned as “beautiful autumn landscape” from Instagram. The computer has no idea what “beautiful” or “landscape” mean (yet), but it remembers that image it saw of an “oak tree in autumn.” It might notice how the trees in both images have a red tone which leads it to assume autumn means red trees.
But how do we get from this to an image? Now that the AI can associate words to parts of an image, it begins by making random modifications to an image of noise. After each iteration, it checks for anything that could associate it with words in the prompt. If so, it will keep making that part of the image more clear, until there is no noise left.
The advantages of this system over the original are clear and varied. Firstly, we don’t need a human slowly classifying images to teach the computer. The computer can teach itself! Secondly, there can be infinite categories, as they are no longer up to humans. Both of these, when applied to DALL·E 2, mean more detailed images, better understanding of the prompt, and that it can continue learning on its own.
Oh, remember how I said you can also expand images? That’s called Outpainting. Here is that same classroom from the thumbnail.
Notice how the center of the image is the same, while more has been added to the left and right edges. However, why does this all matter? Sure, it’s pretty cool tech, but it’s an even better tool. Anyone, including you, can head over to the DALLE·2 website, sign up, and start generating images immediately (up to 15 per month with 50 as a sign-up bonus). All for free. You legally own the images generated too, so you are free to use them in your own works, websites, and even use them commercially!
For example, a few weeks ago my Catalan class at ASB was given an assignment to write a story. After my group had drafted a few paragraphs, we began to play around with the idea of illustrations. They were not required, but I felt they would help add to the atmosphere and the reader’s imagination, and be a fun addition. Now, due to my group’s lack of artistic abilities, outsourcing would be required. So, I turned to a little AI I had recently been granted access to. I wasn’t very hopeful that the AI would be able to both capture the details I wanted in the drawing and make it look like a drawing as opposed to a photo, but I still gave it a shot. Dispiritedly, I asked it to generate “A Ford Ranger F100 driving on a desert road with 3 passengers in the pickup, one driver, digital art.” The results, delivered in a few seconds, were glorious.
The AI had not only created the super specific car model, a Ford Ranger F100, but it had also nailed the digital art aesthetic. From the gentle gradient in the sky to the photorealistic shadows cast by the car, the “drawing” appeared virtually perfect. This image, which would have taken my friends and me (or even a professional digital artist) hours to draw, was now mine to use however I pleased.
So, here we are, in an era where AIs can draw with almost the same fidelity as humans, and create photorealistic photos from nothing. I’ll let you know if my next article is written by a computer.
BONUS CHALLENGE: Can you tell real/human-made images from DALL·E 2 generations? Submit your answers in this Form! Winners will be announced in the next edition!
Bibliography:
Chen, James. “What Is a Neural Network?” Investopedia, Investopedia, 22 Sept. 2022, https://www.investopedia.com/terms/n/neuralnetwork.asp.
Fein, Daniel. “Dall-E 2.0, Explained.” Towards Data Science, 16 May 2022, https://towardsdatascience.com/dall-e-2-0-explained-7b928f3adce7.
OpenAI. “Dall·E 2.” OpenAI, OpenAI, 14 Apr. 2022, https://openai.com/dall-e-2/.
OpenAI. Terms of Use, 20 July 2022, https://labs.openai.com/policies/terms.
Ramesh, Aditya. How DALL·E 2 Works, http://adityaramesh.com/posts/dalle2/dalle2.html.
Amazing article Ivo!!! Thank you very much for shedding a light into the IA in such an instructive and clear way, you got me wanting to learn more about this area, and I am very interested in exploring DALLE-2.
Ivo, keep it going crack! It is awesome having so talented students in the school, and, especially, in my son’s class.
Here’s Chat-GPT’s contribution to the subject:
AI content generation is a type of artificial intelligence that is designed to automatically generate content, such as images or text, based on a set of input data. Two well-known examples of AI content generation are DALL-E, an image generation system developed by OpenAI, and GPT (Generative Pretrained Transformer), a text generation system also developed by OpenAI.
The development of AI content generation systems like DALL-E and GPT was motivated by the desire to create AI systems that could produce original and creative output. Traditional AI systems were often limited to performing specific, pre-defined tasks, such as playing chess or recognizing faces in photos. In contrast, AI content generation systems are designed to be more flexible and open-ended, allowing them to generate a wide variety of outputs based on different input data.
The purpose of AI content generation systems is to automate the process of generating content, such as images or text. This can be useful in a variety of social and commercial contexts. For example, DALL-E can be used to automatically generate images for advertising, web design, or social media. GPT can be used to automatically generate text for chatbots, social media posts, or news articles.
Overall, AI content generation systems like DALL-E and GPT represent an important development in the field of artificial intelligence, offering the potential to automate the generation of creative content in a wide variety of social and commercial contexts.