Google’s new AI tool Whisk uses images as prompts

0 Comments

Google has yet another AI tool to add to the pile. Whisk is a Google Labs image generator that lets you use an existing image as your prompt. But its output only captures your starter image’s “essence” rather than recreating it with new details. So, it’s better for brainstorming and rapid-fire visualizations than edits of the source image.

The company describes Whisk as “a new type of creative tool.” The input screen starts with a bare-bones interface with inputs for style and subject. This simple introductory interface only lets you choose from three predefined styles: sticker, enamel pin and plushie. I suspect Google found those three allowed for the kind of rough-outline outputs the experimental tool is most ideal for in its current form.

As you can see in the image above, it produced a solid image of a Wilford Brimley plushie. (Google’s terms forbid pictures of celebrities, but Wilford slipped through the gates, Quaker Oats in tow, without alerting the guards.)

Whisk also includes a more advanced editor (found by clicking “Start from scratch” from the main screen). In this mode, you can use text or a source image in three categories: subject, scene and style. There’s also an input bar to add more text for finishing touches. However, in its current form, the advanced controls didn’t produce results that looked anything like my queries.

For example, check out my attempt to generate the late Mr. Brimley in a lightbox scene in the style of a walrus plushie image I found online:

Screenshot of an AI generation tool producing images a man who looks a bit like Wilford Brimley.
Google / Screenshot by Will Shanklin for Engadget

Whisk spit out what looks like a vaguely Wilford Brimley-esque actor eating oatmeal inside a lightbox frame. As far as I can tell, that dude is not a plushie. So, it’s clear why Google recommends using the tool more for “rapid visual exploration” and less for production-ready content.

Google acknowledges that Whisk will only draw from “a few key characteristics” of your source image. “For example, the generated subject might have a different height, weight, hairstyle or skin tone,” the company warns.

To understand why, look no further than Google’s description of how Whisk works under the hood. It uses the Gemini language model to write a detailed caption of the source image you upload. It then feeds that description into the Imagen 3 image generator. So, the result is an image based on Gemini’s words about your image — not the source image itself.

Whisk is only available in the US, at least for now. You can try it at the project’s Google Labs site.

This article originally appeared on Engadget at https://www.engadget.com/ai/googles-new-ai-tool-whisk-uses-images-as-prompts-210105371.html?src=rss
Source: engadget
Google’s new AI tool Whisk uses images as prompts