Image prompting is a technique that consists of using words to indicate to an artificial intelligence model what one wants it to do or produce, usually in the absence of visual examples. For example, if one wants an artificial intelligence model to generate an image but has nothing to show it as an example, what would one do? One would try to describe what one wants with as much precision as possible. This is what is called image prompting1.
There are different tools that exist for image prompting, such as:
- Stable Diffusion, a text-to-image model that uses diffusion to generate high-quality and diverse images from natural language prompts2.
- Midjourney, a text-to-image model that uses a transformer encoder-decoder architecture and a large-scale dataset to generate realistic and coherent images from natural language prompts3.
- DALL-E, a text-to-image model that uses a transformer encoder-decoder architecture and a large-scale dataset to generate diverse and creative images from natural language prompts4.
- IP-Adapter, a lightweight adapter that enables image prompt capability for pretrained text-to-image diffusion models, allowing users to incorporate an image alongside a text prompt to shape the resulting image’s composition, style, color palette or even faces5.
Image prompting can also have some biases, such as:
- Social biases, which reflect how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location in the generated images6.
- Occupational biases, which exclude certain groups of people from the results for neutral prompts related to professions or roles6.
- Geographical biases, which show that images generated for location-neutral prompts are closer and more similar to images generated for locations of certain countries or regions6.
Image prompting can also cause some IP issues, such as:
- Infringement of intellectual property rights, which occurs when the generated images use or copy elements from existing images that are protected by copyrights, trademarks, patents, or trade secrets7.
- Lack of attribution or consent, which occurs when the generated images use or copy elements from existing images without giving credit or obtaining permission from the original creators or owners7.
- Ethical and moral dilemmas, which occur when the generated images use or copy elements from existing images that are sensitive, controversial, or harmful to someone physically, emotionally, or financially7.