Unsurprisingly, AI dominated the announcements at Google I/O 2025: updates to Gemini 2.5, Veo 3, and Imagen 4 models, AI Mode for the search engine... Google places AI and Gemini at the core of its products.
The Gemini 2.5 Family
Gemini 2.5 Pro, launched last March, now features an improved reasoning mode called Deep Think. This mode allows the model to allocate more computational cycles to complex tasks, particularly in mathematics or programming, and to explore multiple hypotheses before formulating a response.
Gemini 2.5 Flash, unveiled last April, is a hybrid reasoning model, enabling developers to toggle reflection on or off, designed to offer an optimal balance between cost, performance, and latency. Google announced improvements in reasoning, code management, multimodal processing, and extended context comprehension, as well as a reduction in token consumption by 20 to 30%, according to its internal evaluations.
The two models benefit from new features: native audio output via an API for a more natural conversational experience, advanced security measures, and the capabilities of the AI agent Project Mariner's computer usage.
Deployment of AI Mode
While the AI Overviews, introduced at I/O 2024, recently received an upgrade, Google announces the deployment of AI Mode in the United States for inquiries requiring in-depth exploration, comparisons, and nuanced reasoning.
Powered by an optimized Gemini 2.5 model, with access to real-time sources and information, AI Mode relies on a "query fan-out" technique. The AI launches multiple simultaneous searches on different topics related to the posed question, using various data sources, then compiles the search results to provide a structured response with hyperlinks to the cited web pages. Users can refine their searches with follow-up queries if they wish.
Project Starline Evolves into Google Beam
Google Beam uses advanced technologies, such as light field displays, volumetric capture via six cameras, spatial audio, and real-time AI processing, to create a 3D representation of the interlocutor. This approach allows for more realistic conversations, with rendering at 60 frames per second and increased precision of head movements, without requiring wearable devices like headsets or glasses. Initially, this technology was confined to bulky prototypes, but it has been miniaturized to integrate into more compact systems, developed in partnership with HP.
The real-time voice translation feature, which Google also integrated into Meet, enables fluid multilingual conversations, preserving the user's voice, tone, and expressions.
