Yesterday morning my wife told me that the grey squirrel who’s frequently visiting our kitchen dared to venture as far as the living room and jumped on a shelf full of books 🐿 📚.
I replied that it would be cool, if he did it again, to take a picture of him while he’d be reading a book…
A few minutes later, after completing my daily virtual commute (on a smart bike), I went to my (bedroom) office, opened my iMac and typed the following description into DALL-E’s prompt field: “Grey squirrel wearing glasses, reading an open book” by Johannes Vermeer.
I was blown away by the output.
High quality image generation is the latest evolution in the creative power of AI neural networks.
OpenAI, the company originally co-founded by Elon Musk, has just launched DALL-E 2, which I used to generate the series of 4 squirrels above (the brand DALL-E is a portmanteau word combining Dali, the Spanish surrealist painter, and Wall-E, the animated Disney movie).
Previously, OpenAI unveiled their state-of-the-art GPT-3 language model, trained on 175 billion parameters. The commercial GPT-3 API is now leveraged by dozens of text generation apps. Among the most popular you’ll find Jasper, WriteSonic, Scalenut, Rytr and Longshot (I’ve compiled a list of 35+ companies on KPI Crunch).
The advent of those powerful tools raises a lot of practical and ethical questions.
Some people praise the almost magical outputs generated by image and text generation models. Others point out the threat they represent for the creative class, copywriters, illustrators, photographers, movie makers, etc. who could lose their job if AI was able to replace them at a fraction of the cost.
My goal in this piece is to debunk the doomsday theories while elaborating on the opportunity of a human-AI partnership which, in my opinion, will spark an unprecedented explosion of creativity.
It’s not the first time that machines are considered a threat to humanity
In the early days of the industrial revolution, the Luddite movement emerged in Great Britain as a violent reaction against the mechanization of the textile industry, which was putting low paid weavers out of work.
Unable to envision a future where work would (have to) be different, the Luddites were destroying the machines which posed an immediate threat to their livelihood.
It is understandable: most people crave for stability, if not status quo. When you’re asking workers to reconsider their deeply rooted certainties and adopt a new behaviour, you inevitably face an outcry of complaints, usually driven by irrational fears.
First of all, because uncertainty is daunting. Most of us prefer a clear safe roadmap than a jump into the unknown. If you’ve been doing the same job for 10+ years, making a decent living out of it, and are now being told that you have to acquire new skills to keep up with the game of innovation, you’re either excited or reluctant to join the race.
Innovation is what moves us forward as a civilization. Compounded innovations are the building blocks of Progress.
You can of course adopt a contrarian standpoint and decide to build your value on tradition. You could say: “I’ll never use AI to help in any shape or form to assist me in my creative process. I don’t want to leverage Grammarly, Ludwig or Hemingway to help me craft a better essay. I don’t want to get any inspiration from DALL-E, Midjourney or Nightcafé to illustrate my pieces.”
Fair enough, there will always be room for purists. You might even command a premium price if you can show that your approach delivers outstanding results.
But I would argue that for the vast majority of mainstream use cases, AI should be integrated in your toolbox. It’s not a replacement for human creativity, it’s an enhancer. AI augments the bandwidth of your imagination.
Are there any concrete examples of AI used in mainstream media?
In June 2022, The Economist used AI to generate the cover of an edition titled AI’S New Frontier.
On August 9, 2022, a journalist of The Atlantic used Midjourney to illustrate a piece about Alex Jones.
Interestingly, that AI-illustrated piece sparked a viral tweetstorm which forced the journalist to apologize a few days later.
This was how the journalist, Charlie Warzel, introduced the controversy in his apology:
“A day later, an artist–slash–art director saw the post and noticed that I credited “AI art by Midjourney” in the captions. In a series of tweets, they wrote that they were frustrated and a little shocked that a national magazine like The Atlantic was using a computer program to illustrate stories instead of paying an artist to do that work. They were also concerned about this particular use case potentially giving other publications an excuse, or at least an idea, to cut corners on an art budget.”
The Guardian commented on the incident by saying that “it was painful for him (the journalist, Ed.), but maybe also a salutary warning that publishers who give work to machines rather than creative artists deserve everything they get.”
Why so much revulsion towards AI image generation?
The main argument is that DALL-E, Midjourney and other models have been trained on the creative work of human artists who are not being compensated for the inspiration.
Let’s pause for just a minute here.
Take a few famous artists, like Van Gogh, Monet, Magritte, Picasso or Dali. How did they develop their artistry? Was it in some pure vacuum, unaware of their cultural surroundings? Or were they inspired by their predecessors and contemporaries? Did they pay any royalties to their sources of inspiration? You could say: well, yes, they were inspired by other artists but their brain had to invest time and labour to transform those inspirations into a spark a genius.
So we’re comparing the biological brain of a living artist with the digital brain of an AI agent. And what the opponents of AI image generation would argue is that the output of a biological brain trained with Real Life Inspirations has more value than the output of a digital brain trained with gigabytes of digital data (which is actually a recording of IRL inspirations)? It doesn’t make sense. It’s an overly anthropocentric view of intelligence.
A paper titled “Human- versus Artificial Intelligence” perfectly articulates the arguments at stake. Here’s what the authors declare about the false premise of the debate:
“Implicit in our aspiration of constructing AGI systems possessing humanoid intelligence is the premise that human (general) intelligence is the “real” form of intelligence. This is even already implicitly articulated in the term “Artificial Intelligence”, as if it were not entirely real, i.e., real like non-artificial (biological) intelligence.
The idea that A(G)I should be human-like seems unwarranted.
We propose a (non-anthropocentric) definition of “intelligence” as: “the capacity to realize complex goals.”
We should not dismiss AI because it’s non-human. We should embrace the complementarity between the unprecedented computing capacity of digital brains and our creative cognitive abilities, facilitated by our biological envelope.
What is the value of art when deprived of conscious emotion?
That’s an interesting question. We all know that Art expresses the deep emotions of the artist. That’s why tormented creative souls usually produce more interesting art than the happy fellows leading a trouble-free existence. So how can art generated by emotionless machines have any value?
We could present two counter-arguments.
One being that AI art is inspired by generations of emotion-driven creations (as are all artists in the flesh and blood realm).
The other one being that in most cases the machine is still reacting to a human-generated prompt, which is the expression of “real” emotions, to the extent that we’re not living in a gigantic simulation.
Will AI image generation replace stock photography?
At time of writing, AI is better at generating illustrations than photorealistic scenes (with the exception of This Person Does Not Exist, specifically trained to generate portrait photos).
If you prompt DALL-E to generate a photo realistic image, you might get very strange results.
See for instance what I got when prompting the engine with “Two young passionate lovers kissing on a bench, on the side of the River Thames in London. Photo-realistic.” Look at the distorted faces of those youngsters.
Nevertheless we already come across a lot of press articles and forum posts announcing the end of Shutterstock, Getty and the likes.
Every day brings its share of sensational headlines.
By the way, the Reddit comment above stresses the importance of prompt engineering, an emerging skill which I already addressed in a previous piece.
There will still be a need for real photos, as the ones you can find on Unsplash, recently purchased by Getty. AI will contribute to the discovery (visual recognition, tagging, collection curation), the editing and the integration of those assets, as part of a universal corpus of visual data.
I would say that illustrations, which are a representation of a mental picture of an object, being or concept, are probably the best candidates to be streamlined in the near future by an intense use of AI. This will open new forms of creative endeavours for solo writers. In another creative sector, I also expect sound design to benefit from AI, offering on-demand cinematic soundscapes to a new generation of storytellers.
We won’t be less creative with AI.
It will unleash new heights of creativity.
Let’s give GPT-3 the opportunity to provide a conclusion to this article.
Should we be afraid of AI image generation?
There is no need to be afraid of AI image generation, as long as you understand how it works and how to use it responsibly.
Which AI tools can you use today (2022)?
AI Image Generators
- OpenAI DALL-E (waiting list): https://openai.com/dall-e-2/
- Midjourney (powered by a Discord server): midjourney.com
- NightCafé: https://nightcafe.studio/
- DreamStudio: https://beta.dreamstudio.ai/
- Pixray: Replicate.com/pixray
- Stable Diffusion Replicate.com/stability-ai/stable-diffusion
AI Text Generation (Natural Language Generation)
I invite you to prompt the native UI of GPT-3 (most of the other tools listed earlier in my article are simply abstractions of the core engine, with a user experience determined by their own built-in prompts). Check out OpenAI GPT-3 on https://beta.openai.com/
Foot note for the Jamstack geeks: this article was originally written in Notion, then exported as a markdown file and imported into Publii, my static website builder. In order to keep a dynamic reference to the images, they were embedded into Notion from URLs (not uploaded from my computer).
Featured image: “AI Will Soon Replace Jobs”, created with Dreamstudio.