Google has introduced an innovative AI tool named Whisk, designed to enhance user creativity by allowing image-based prompts for generating new visuals. This tool provides an alternative to traditional text prompts, enabling users to leverage existing images to express their desired subjects, scenes, and styles.
Whisk allows users to input multiple images for guidance on what they want the AI to create. Users can specify images to define the subject, scene, and style. If users do not have images readily available, a dice icon can be clicked to generate AI-crafted images for the prompts, catering to a user-friendly experience.
After the input process, Whisk generates images accompanied by corresponding text prompts. Users have the liberty to favorite or download any image they find appealing. For those seeking to further refine their image, Whisk provides the option to edit the text prompts or generate additional content by clicking on the images.
In a blog post, Google emphasized that Whisk is intended for rapid visual exploration rather than creating pixel-perfect edits. The company acknowledged that the outputs may sometimes “miss the mark,” which is why they included the ability for users to edit prompts to enhance results.
During initial trials with Whisk, users found the interface engaging and entertaining. Although the image generation process could take several seconds—an inconvenience for users in a hurry—the overall experience of generating and iterating on images proved enjoyable, showcasing the potential for creative freedom.
Whisk utilizes Google’s latest Imagen 3 image generation model, which promises improved performance in generating imaginative visuals. Additionally, Google has rolled out the Veo 2 video generation model, noted for its understanding of cinematic language and enhanced accuracy. This model is expected to debut in Google’s VideoFX and expand into YouTube Shorts and other platforms next year.