Elon Musk's AI venture, xAI, has recently introduced an upgraded version of its Grok 1.5 model – the Grok 1.5 Vision. This new model integrates computer vision capabilities, allowing it to interpret visual content and respond to questions about images. This development comes shortly after OpenAI presented its GPT-4 model, which also boasts computer vision features.
xAI announced this upgrade through their official X account (formerly Twitter), sharing insights into the model's capabilities via a blog post. While the core features of Grok 1.5 remain consistent with this updated version, the added vision capabilities promise to open new horizons in AI interaction with the real world.
https://t.co/A12vgTpnTb
Also read: Apple to give a major AI boost with iOS 18 update: Check what AI features your iPhone may get
Benchmark tests were conducted by xAI, showcasing Grok 1.5 Vision's performance against various metrics, including the company's proprietary RealWorldQA benchmark. This benchmark evaluates the model's "real-world spatial understanding." Additionally, the model was assessed in other tests like MMMU and ChartQA. Impressively, in RealWorldQA, Grok surpassed OpenAI's GPT-4 with Vision and Google's Gemini 1.5 Pro, although it lagged behind in other tests.
Also read: OpenAI announces new Tokyo office, hires former Amazon staffer to spearhead AI push
Computer vision is an exciting field in computer science focused on enabling computers, including AI models, to recognize and interpret real-world objects through images and videos. Essentially, it aims to empower machines with human-like vision capabilities.
Several leading tech companies are investing heavily in developing vision-centric AI models. Google's Gemini 1.5 Pro and OpenAI's GPT-4 with Vision are notable competitors in this space.
The potential applications for computer vision are vast and transformative. For instance, Healthify, an Indian platform for calorie tracking and nutrition, recently integrated a feature
Read more on tech.hindustantimes.com