The use of Artificial Intelligence in daily life is now at a point of normalisation. We use it all the time almost without thinking twice about it. Many of us check the weather forecasts, use Face ID to unlock our phones, and use voice recognition software like Siri (to varying degrees of success), and these are all examples of everyday use of AI.
However, one recent and extraordinary advancement in the world of AI is a machine being capable of image generation from a text input. The most well-known technology that is part of this development is called DALL·E 2, with the name being a portmanteau of WALL-E and the Spanish artist Salvador Dalí. Created by the corporation OpenAI, the technology uses machine learning models to generate digital imagery from language descriptions.
To understand how developed image generation AI is, we must first go back to the past. The earliest significant use of AI for image generation was in the 1970s, by a machine known as “AARON”. Naturally, the AI at this time was very rudimentary and quite limited. It was capable of painting specific objects, to which its creator, Harold Cohen, would then finish off by hand. Over time, AI image generation became gradually more complex. In the 1980s, AI could situate people or objects in 3D space, and by the 1990s it could generate colour.
Drawing from an untitled AARON drawing, c.1980
More recently, Nvidia created an Artificial Intelligence capable of generating human faces of people who have never lived and will likely never live. This was achieved by using what is called a Generative Adversarial Network (GAN), a type of deep learning machine learning model, as well as many algorithms to develop faces of non-existent people. You can try this yourself if you visit the website www.thispersondoesnotexist.com. What was once cutting-edge technology can now be used repeatedly on a mobile device.
With AI able to generate human faces and expressions, the next step up the ladder is the topic of this article. DALL·E 2 is the second AI in the “DALL·E” series, with it capable of generating images 4 times the quality and better caption accuracy than its predecessor. Having just released to the public in October this year, the AI is one to be marvelled at with its 3.5 billion parameter neural network. It can generate almost anything: photo-realistic imagery of buildings, oil paintings of anything you like, or even impossible concepts like a “snail made of cucumber”.
“A snail made of cucumber”. DALLE·2 c.2022.
During my own experimentation of this AI, I was curious to see what it would generate with the prompts “Eton”, “Eton College” and “Eton College chapel”. What the AI outputted surprised me. It’s important to keep in mind that DALL·E 2 does get it wrong, as this AI is far from perfect, but the images are eerily similar to the actual buildings that we walk past every day.
As powerful as this technology may be, there are ethical implications with releasing such technology to the general public. The AI’s main source of criticism is that it suffers from algorithmic bias due to its reliance on public datasets. Examples of this bias include generating larger numbers of men than women, even with prompts that do not mention gender. In an attempt to avoid the generation of violent or sexual images, training data with such themes was filtered and removed.
Another cause for concern is the technology’s involvement in deep-faking, which is the process of superimposing one’s face onto the image or video of another’s body, normally in a non-consensual manner. In a step to prevent this from happening, OpenAI blocks all content that potentially includes a real human face.
The future impact of Artificial Intelligence such as DALL·E 2 will be immeasurable. This current technology is merely a stepping-stone to a future where AI can generate hyper-realistic imagery, and with the current speed of AI development, this future may come sooner than we think.