On January 5th, 2021, OpenAI announced its new neural network Dall-E, a landmark network that could generate images from text. The examples shared in the post – cartoons of baby radishes, armchairs in the shape of avocados, paintings of capybaras – look low-tech to our 2025 eyes, but they were a huge achievement at the time.
Reading the post deep in pandemic lockdown, reality melted as I looked at a “photo” of a bust of Homer that didn’t exist, a sensation that was only heightened the next day, as the US Capitol was attacked and pictures of events that seemed unimaginable filled my feed.
Over the last four years, OpenAI continued to develop Dall-E, eventually integrating it into ChatGPT. ChatGPT’s ability to generate images, however, was more often than not eclipsed by rival generative image platforms such as Midjourney and Stable Diffusion. That all changed last week.
On March 25th, OpenAI released an update to ChatGPT that enables paying users to generate images at a much higher level of fidelity than previously. The results are impressive – the post announcing the update is replete with images of dreamy gaming concept art, Vogue editorial-style fashion photography, photorealistic mirror selfies of young women, and a candid snap of Karl Marx at a shopping mall.
Do we really need 10,000 steps a day to stay healthy?
Laura Kennedy: This homesickness is not a yearning for return but rather for reconnection
Pierce Brosnan: ‘I had no qualifications. I was really behind the eight ball – without a mother, without a father’
One to One: John & Yoko review – Watch this film at a good cinema to really appreciate its seat-juddering sound
OpenAI describes the results as “image generation that is not only beautiful, but useful”. The preoccupation with “beautiful” images is not that interesting to me – as Marcel Proust said, “Let us leave pretty women to men with no imagination” – but the concept of how image generation can be “useful” is, because one of the biggest use cases immediately after the release was enlisting ChatGPT to generate images in the style of Studio Ghibli.
Studio Ghibli is a Japanese animation studio responsible for some of the most beloved films of the last few decades – My Neighbor Totoro, Spirited Away, Princess Mononoke. Ghibli films are by turns full of joy and grief, life-affirming and bittersweet, all mediated through subtly rendered, painstakingly executed animation.
Studio Ghibli is focused on craft, and the values of its founder, Hayao Miyazaki, are often positioned as antithetical to AI. In a 2016 interview, Miyazaki famously remarked that using AI in the creation of a grotesque animation was “an insult to life itself”.
One can only imagine Miyazaki’s horror as, within a few hours of OpenAI releasing the update, users on social media began to focus primarily on “Ghibli-fying” pre-existing images.
These images fell broadly into three categories. The first were innocuous – family photos rendered as sweet anime-style images, presented side by side with the originals. Grinning couples, adorable kids. The second group were scenes from films, TV shows and memes. Stills from The Wolf of Wall Street, The Office, Breaking Bad, Disaster Girl, all hard edges rounded into intertextual cuteness.
[ Amazon races to transplant Alexa’s ‘brain’ with generative AIOpens in new window ]
The third group were the “shitposts”. The JFK assassination, Hitler, Volodymyr Zelenskiy’s dressing-down in the Oval Office. A body falling from the Twin Towers on 9/11 sketched in delicate watercolour. The official X account of the White House posted a Ghibli-fied image of an illegal immigrant and convicted fentanyl trafficker weeping as she was arrested.
Zach Warunek, an employee of X, generated a version of the Columbine school shooting, posting “Didn’t think it would actually do it lmao” over an AI-generated image depicting an event where 13 schoolchildren and a teacher were murdered.
What, in OpenAI’s words, is “useful” about this? I would guess, for Warunek, the use lies in the fact that the image was viewed 10.7 million times, attracting 48,000 likes.
In the launch video for the update, streamed live on March 25th, OpenAI chief executive Sam Altman talks with a number of OpenAI researchers, including Gabriel Goh, head of multimodal research. Goh describes how over the last nine months, as he worked on the model, he realised that he was surrounded by hundreds of images over the course of a day, images created “to persuade, to inform, to educate”. Goh describes these as the “workhorse images” that “comprise our everyday life”, noting that he’s excited to give “this power to create workhorse images to everyone in the world”.
What is the “work” that these “workhorse images” perform? For the vast majority, it is ensuring clicks, virality, clout. These workhorse images power the engines of social media. They’re not images intended to be presented as art. They’re images designed to be posted, with the hope of being reposted, reacted to, iterated on.
[ Form and function: What makes one object art and another ‘mere’ craft?Opens in new window ]
I’d argue that because of this, they stop functioning as images (something that can be copyrighted) and become something closer to speech (which must be free). Back in 2019, social media theorist Nathan Jurgenson observed this in his excellent book The Social Photo, noting how “as images become easier, quicker, more abundant, their status as objects becomes more secondary to their role as ... a form of in-the-moment communication”. This is only more true since the advent of generative AI.
Whether you find this fascinating, horrifying or a combination of both is irrelevant. Generative AI means we need to learn to read images in a new way.
This is crucial, because trying to argue for the rights of the artist, attempting to discuss the concept of copyright, is completely futile in the face of the logic of the workhorse image, in the face of the logic of an image that is a form of communication – in the face of what generative AI has done to images, to art.
Art’s primary function in this context becomes clear. Art provides the raw materials for the production of images, which in turn power social media, which enriches the most powerful companies in the world, which, seemingly as a side effect, establish the various realities we live within.
Jennifer Walshe is professor of composition at Oxford University