Google Gemini was unveiled by Alphabet CEO Sundar Pichai and the company's AI research division DeepMind's CEO Demis Hassabis yesterday, December 6. Leaving PaLM-2 behind, it has now become the largest large language model released by the company so far. With its size, it also gains new capabilities. Being a multimodal AI model, its highest variant, Gemini Ultra, is capable of responding with text, images, videos, and audio, pushing the boundaries of what a general-purpose foundation model can do. So, if you have been wondering about the features and use cases of Gemini AI, then check them below.
After the announcement of its new AI model, Google posted a YouTube video where it showcased the capabilities of Google Gemini. The video mentions, “We've been capturing footage to test it on a wide range of challenges, showing it a series of images, and asking it to reason about what it sees”. The entire video highlights some of the more advanced features and use cases of Gemini.
Throughout the video, Gemini has been given access to a camera and it can see whatever the user is doing. The video puts the AI model through several tests, where it has to analyze whatever is going on in the visual medium.
In the first segment, the user draws on a piece of paper and asks Gemini to guess what it sees. The AI model keeps guessing the image as the user continues to add more complexities to it. At each step, Gemini is capable of offering a reasonable analysis of the drawing and providing additional information about the object. It also recognized objects and offered information about what they might be made up of.
In the second segment, the user asks the AI to tell him how to pronounce a word in a different language. Not only does the AI show
Read more on tech.hindustantimes.com