(Updated: Friday, January 5, 2024)

Google has been sharing some great news lately.
First up, they launched their impressive Gemini AI less than a month ago. The demos they showed at the press event really wowed everyone. And that's not all: they've made the first version of Gemini Pro available through the Gemini API, introduced Imagen 2, and started MedLM, a set of tools designed for healthcare. It's been a busy and exciting time for Google in AI.

Among these updates, the text-to-image tool Imagen 2 is getting a lot of attention. Before this, Google had released its first version called Imagen, a text-to-image diffusion model. The images it created were quite fascinating. Now with the arrival of Imagen2, as they said on the official X: "Imagen 2 is our most advanced text-to-image diffusion technology, with high-quality, realistic output and greater consistency with user prompts."

Developers and cloud customers can use Imagen 2 through the Imagen API in Google Cloud Vertex AI.

Prompt: oil painting, an orange on a chopping board. The light passes through the orange section, casting an orange glow on the cutting board. There is a blue and white cloth in the background. Caustics, reflected light, expressive brushstrokes.

!["Prompt: Oil painting, an orange on a chopping board... (pandaron.com)"] (https://static.pandaron.com/sized/prompting_oil_painting_pandaron_1000x1000.jpeg "Prompt: Oil painting, an orange on a chopping board... (pandaron.com)")

In order to create high quality yet more accurate images that are more consistent with user prompts, Google DeepMind has made some changes in the Imagen 2 training data set. They have added more detailed descriptions to the image descriptions/captions, so that Imagen 2 can Learn different descriptions and generalize them to better understand user prompts. The enhanced image-description pairs help Imagen 2 better understand the relationship between images and text, resulting in enhanced understanding of context and nuance.

Check this out. Prompt: The robin flies from the swaying ivy to the top of the wall, opens its beak, and sings a loud, lovely trill, just to show off. There's nothing cuter in the world than a robin when it's showing off. - They almost always do. ("The Secret Garden" by Frances Hodgson Burnett)

The team at Google have trained a specialized image aesthetics model based on human preferences for lighting, framing, exposure, clarity and other qualities. Each image is given an aesthetic score, which helps tune Imagen 2 to give more weight to images in the training dataset that match human preferences. This technology improves Imagen 2's ability to produce high-quality images. Imagen 2 can even render text within the images.

It can also design logos for various businesses, brands or products.

Logo examples for brands (pandaron.com)

There are so many other features. Just listing a few:

Imagen 2 supports image editing functions such as inpainting and outpainting. By providing a reference image and an image mask, users can use inpainting techniques to generate new content directly in the original image, or use expanding techniques to extend the original image beyond its boundaries. Google Cloud’s Vertex AI plans to adopt the technology in the new year.

Masked editing Inpainting (pandaron.com)

Imagen 2 integrates with SynthID, a cutting-edge toolkit for watermarking and identifying AI-generated content, allowing Google Cloud customers to add imperceptible digital watermarks directly into image pixels without compromising image quality. This allows SynthID to detect watermarks even after modifications such as filters, cropping, or lossy compression have been applied.

On the security and user protection front, what we have learned from the product is that, the research team behind goes through extensive safety checks before any new features release. There's also technical barriers set up to prevent the creation of any inappropriate content, like violence, offensive material, or anything explicit. This includes careful monitoring of the training data, the prompts given to the system, and the output it produces. For instance, they use detailed security filters to prevent the generation of sensitive content, such as images of specific individuals.

Stories others like

Going wild - Neuralink and Brain-Computer Interfaces

Adam Hessian

A technology that develops direct information interaction between machines and nervous systems. This direct exchange of information bypasses any muscle movements and sensory organs and is reciprocal...What are your thoughts on Brain-computer interfaces and its implications?

Top 3 Picks of API Cost Management Systems

Ronald Baker

If you ask, what is a good API cost management system? Here's a short answer of mine: It should not only ...

Data Privacy and Meta Charges for EU Users

Ronald Baker

Meta's recent (Oct 2023) decision to introduce subscription fees for Facebook and Instagram users in the European in response to the GDPR. In this article, we are looking closely into this Meta's new strategy, examining the reasons behind the subscription model and its potential impacts, and what others can learn from this.