Text to Image – Artificially Intelligent Art or Synthesized Collages?
There have been a lot of newspaper column inches dedicated to Artificial Intelligence lately, specifically the huge leaps forward in AI assisted creative tools. Most of you will have heard of the conversational ChatGPT tool and it’s written content that it can churn out in rapid fashion. I’m sure you have all seen endless articles and videos about its potential uses (and misuses), but a good deal of the commentary feels slightly overblown to me. I have played around with it and got some interesting results, as well as some truly useless results. For the moment I don’t see it putting me out of a job as a writer, and I will likely use it as yet another tool in my belt for inspiration. However, this post is not about ChatGPT (perhaps another day) but rather my recent creative journey with a handful of text-to-image AI tools. I was curious to see what the fuss was all about and, more importantly, what they could do.
Any sufficiently advanced technology is indistinguishable from magic
My first port of call was the DALL-E 2 tool (which seems to be operated by Open AI, the same folks behind ChatGPT, or maybe its an open source platform for lots of tools). I decided to test it by trying to generate an image of one of the people I admire the most (Werner Herzog) alongside one I admire the least (Conor McGregor). The results were definitely mixed to say the least.
In some instances I added a few qualifiers to the prompt to see if I could generate more interesting compositions, such as an interview setting or a jungle location. The outputs were relatively poor and at times reminded me of collages I did as a kid where I clipped out photos from magazines and mashed them together on a big piece of paper. Some of them looked almost grotesque and had echoes of the Spitting Image puppets. On another level, I had to check myself and be reminded of the mere fact that they could be created at all. It was kind of mindblowing.
I then turned to NightCafe which required me to log in to it and it gave me a handful of free credits. I decided to ask it to create an image of Elvis Presley in his iconic white jumpsuit, seated with his back to the camera, staring out at a field with a solitary cow. The results were pretty average (he wasn’t seated for starters). Or perhaps it’s more accurate to say, they didn’t match what I had in my mind.
So I promptly turned back to DALL-E 2 and tried the same prompt, but this time I asked for it to be rendered in the style of the Romantic landscape painter Caspar David Friedrich (I learned about him via Werner Herzog funnily enough, so my brain was making connections deep within it seems). The results were cool and somewhat more photorealistic (as opposed to painterly).
With this new information about creating images in a certain painter’s style, I decided to do a series of images in the style of my favourite painter of all time, Edward Hopper. Most of you will be familiar with his famous Nighthawks painting. His beautiful use of light and sparse, melancholy imagery is unparalleled. So, I wanted to see how the tools could handle one of the great American Masters. So I decided to pick decidedly modern figures to be painted in this style, and started by choosing Nicolas Cage, initially alongside Elvis, and then on his own. The results were actually pretty cool. Not distinctively like Hopper, but cool nonetheless
With a few tweaks to my prompt, I was able to make them more like an Edward hopper painting in terms of style, although the figures themselves were sometimes too cartoonish (bordering on caricature). These results were starting to look really good though. Although, one thing I was starting to notice was how the AI tools seemed to have real trouble with fingers/hands. Look at the image on the left below. What is going on with his hands there?
I then tried out portraits of Taylor Swift in the style of Edward Hopper and they were much more successful. They almost felt quite authentic. Perhaps this has something to do with Taylor Swift’s rather timeless look.
It was time to go full circle and try out a portrait of Werner Herzog in the style of Edward Hopper. The result was pretty good in fairness. It was getting closer to matching what I had in my mind.
Time for another tool and so I tested out the hugely popular Midjourney. This required me setting up a Discord account, as all of the images are generated within the Discord interface for some reason. There was a limit on the amount of images I could generate too. The results were positively mindblowing. Midjourney was leaps and bounds better than any of the other tools I had tried (there were several others I tried that I have not included here, including an in-app functionality in the design tool Canva). I tried a tricky prompt that required Midjourney to create an Edward Hopper portrait of Werner Herzog as an ageing Superman in a bar. And what a result! I was bowled over. The lighting, the detail, the composition. All terrific.
Midjourney also offered a nice feature where it spits out 4 images, and you can select any image to regenerate another set of 4 variations of that selection. Equally you can select one to upscale it to a higher definition image. I was so impressed that I seriously pondered getting a subscription. It seems there are a lot of ethical concerns that remain unresolved around copyright of the source data & images used to generate these AI pieces of art. I would never want to interfere with another artists intellectual property (and certainly not without paying them for the privilege) so I will step away for now as the legalities of the tool get untangled. But in terms of pure output, Midjourney was on another level to anything else I tried. Truly fascinating.
Overall it was an intriguing experience to play around with these tools, especially for a creative person like me. There is still a long way to go in terms of how to get the best out of them, especially when it comes to phrasing your prompts for maximum efficacy in the output phase, but I imagine that will improve over time. In fact, there are websites dedicated to helping you compose aptly worded prompts for these tools, and you can even ask the aforementioned ChatGPT to compose a prompt for you too.
There is much to unpack here and these tools (and others) will be the subject of a future post, but for now I see them as very sophisticated synthesisers. Back in the day you could program different sounds into a synth to correspond to each key on the keyboard, and these tools are not dissimilar in many ways, although the keyboards are different. While these AI synths have millions of notes compressed into them, at the end of the day, in order to elicit the best creative outputs you still need time honoured things like imagination from the composer. For now, we can rest easy and play with these digital toys, because true creative work still requires an original mind at the keyboard.