In recent years, Google has been caught off guard by the new advances in AI from companies like Microsoft and OpenAI. Feeling the need to make a statement, it has turned its annual keynote into a show focused exclusively on its artificial intelligence.
Many new developments were announced, centered on the use of AI in everyday life and in the workplace. Here are the most important announcements:
Gemini Flash and the new version of Gemini 1.5 Pro.
Let's start with the introduction of the new Gemini 1.5 Pro model, which has been improved. Now the model is even more precise in its analysis, and by the end of the year, it will be updated to handle 2 million tokens, compared to the current one million.
Gemini Flash, on the other hand, is the response to Phi-3 Mini (which we discussed here): a lighter and less expensive model. It is powered by a distillation process, meaning that essential information is derived from Gemini 1.5 Pro.
AI for multimedia content.
Sora (OpenAI) and Stable Diffusion (Stability AI) now have new competitors. Google has announced VEO: a model that can produce HD videos of about one minute from a text prompt. VEO simulates scenes with more realistic physics and allows for specific video shots, such as timelapse and panning.
Imagen 3 will compete with Stable Diffusion (Stability AI) and Midjourney with DALL-E (OpenAI). Google promises photorealistic images with fewer artifacts than its competitors and a much more precise understanding of even complex text prompts.
Music is not exempt either. Google has introduced Music AI, which allows the creation of music from text prompts.
Project Astra and Gemini Live.
Project Astra is one of the most interesting announcements. The demo showcased the application of Gemini to the smartphone camera. Essentially, Gemini will be able to analyze and track images in live video and simultaneously talk to the user. This means it will be possible to interact with a voice assistant that sees what we see in real time, with all the resulting changes.
Gemini Live, on the other hand, will be the tool for interfacing with Gemini through voice, offering a more human approach that will likely find its place on smartphones.
Even the Android operating system, the world's most widely used mobile OS, will have deep integration with Gemini and AI. Android smartphones will be able to use the latest features of Gemini to assist users even more, such as listening to an unknown phone call and warning us of potential spam or phishing in progress.
These are just some of the numerous announcements. In the coming months, we will see if Google's AI will truly help Android users and at what cost (considering privacy) and how Gemini's tools will position themselves in the ongoing battle among corporate AIs. Most importantly, we will see if Google will fulfill the mission it set during this presentation: to make AI more accessible and useful for everyone.
Comments