On Wednesday, Google unveiled Gemini 2.0, the subsequent era of its AI-model household, beginning with an experimental launch known as Gemini 2.0 Flash. The mannequin household can generate textual content, photos, and speech whereas processing a number of kinds of enter together with textual content, photos, audio, and video. It’s just like multimodal AI fashions like GPT-4o, which powers OpenAI’s ChatGPT.
“Gemini 2.0 Flash builds on the success of 1.5 Flash, our hottest mannequin but for builders, with enhanced efficiency at equally quick response instances,” mentioned Google in an announcement. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the pace.”
Gemini 2.0 Flash—which is the smallest mannequin of the two.0 household by way of parameter depend—launches in the present day via Google’s developer platforms like Gemini API, AI Studio, and Vertex AI. However, its picture era and text-to-speech options stay restricted to early entry companions till January 2025. Google plans to combine the tech into merchandise like Android Studio, Chrome DevTools, and Firebase.
The firm addressed potential misuse of generated content material by implementing SynthID watermarking know-how on all audio and pictures created by Gemini 2.0 Flash. This watermark seems in supported Google merchandise to establish AI-generated content material.
Google’s latest bulletins lean closely into the idea of agentic AI programs that may take motion for you. “Over the final yr, we’ve been investing in growing extra agentic fashions, which means they’ll perceive extra concerning the world round you, suppose a number of steps forward, and take motion in your behalf, along with your supervision,” mentioned Google CEO Sundar Pichai in an announcement. “Today we’re excited to launch our subsequent period of fashions constructed for this new agentic period.”