Prompting AI Imagery for Production
Artificial intelligence has evolved quickly these past few years, but recently there have been a number of breakthroughs in computer-assisted image generation — in both capability and consumer adoption.
Three big players in the AI sandbox include Google Dream, DALL-E, and Stable Diffusion. Stable Diffusion (SD), in particular, makes it easy to start playing immediately without technical installations.
Any AI that renders images does so by drawing on a little bit of everything; from hundreds of specific images to billions of random images. It is important to the process that the images have metadata — especially descriptions — either written by humans or from image recognition processes.
Google Dream’s evolution demonstrates how the training process works. Initially trained on images of dogs and other animals, most of its early imagery looked like strange, warped, trippy animals. As the technology has matured in the last few years, people are training it with a wider range of images with better outcomes.
SD’s claim to fame is different. It has gained a lot of attention for its behind-the-scenes hustle, which resulted in seed capital (or first-round investments) totalling $101 million. That is the largest seed investment any company has ever received. There are a lot of other companies that have received more money as they get to the second or third rounds — but as a first-round take, this was unprecedented.
Over on the public-facing side, SD allows the user to type in a description — and it creates an image of what the text describes. These text inputs are called prompts. The prompts can be descriptions of an object or a place, like a car or a house, but they can also be descriptions of a style, like watercolor or oil painting — or even a particular artist, like Vincent Van Gogh. It can also take a prompt in the form of a drawing to guide the image’s composition — for instance, if you draw a cube at a three-quarter angle and along with a description of a house, SD will create an image of a house at the same angle.
You can even draw or describe whether the camera lens is a long lens or a wide-angle lens. The AI will also be influenced by which images are considered pleasing to the eye.
AI renderings are a combination of:
- the model it was trained on;
- the prompt it was given; and
- the image or inpainting it was fed.
Whether Google Dream, DALL-E, or Stable Diffusion produces actual art — is the topic of my next article.