As xAI was preparing to unveil its first Large Language Model (LLM) called Grok, Elon Musk boldly declared that the generative AI model "in some important respects" was the "best that currently exists." Now, we finally have the data to prove this claim.

Kieran Paster, a researcher at the University of Toronto, recently put a number of AI models through their proverbial paces by testing them on a held-out math exam. Bear in mind that held-out questions, in data analytics parlance, are ones that are not part of the dataset that is used to train an AI model. Hence, a given LLM has to leverage its prior training and problem-solving skills to respond to such stimuli. Paster then hand-graded the responses of each model.

As is evident from the above snippet, Grok outperformed every other LLM, including Anthropic's Claude 2, with the exception of OpenAI's GPT-4, earning a total score of 59 percent vs. 68 percent for GPT-4.

Next, Paster leveraged xAI's testing of various LLMs on GSM8k, a dataset of math word problems that is geared toward middle school, to plot the performance of these LLMs on the held-out math exam against their performance on the GSM8k.

Interestingly, while OpenAI's ChatGPT-3.5 gets a higher score than Grok on the GSM8k, it manages to secure only half of Grok's score on the held-out math exam. Paster uses this outcome to justify his conclusion that ChatGPT-3.5's outperformance on the GSM8k is simply a result of overfitting, which occurs when an LLM gives accurate results for the input data that is used in its training but not for new data. For instance, an AI model trained to

Read more on wccftech.com

All news from wccftech.com

About this in other media

Elon Musk-owned X can lose up to $75 mn by the end of 2023 due to advertisers fleeing, says report tech.hindustantimes.com /1 year ago

Elon Musk does huge U-turn; says will bring back news headlines from link previews on X tech.hindustantimes.com /1 year ago

Elon Musk announces posts on X added to the Highlights tab will get a greater reach tech.hindustantimes.com /1 year ago

The website gamebastion.com is an aggregator of news from open sources. The source is indicated at the beginning and at the end of the announcement. You can send a complaint on the news if you find it unreliable.

14.12 / 18:23

Platform Mobile Netflix is currently working on nearly 90 games, says 'we’re just getting started'

Netflix has revealed its plans for the future of its gaming initiative, confirming it has almost 90 games currently in development.

14.12 / 18:23

Fighting Action Progressive Destiny Tekken 8 gets official story trailer showcasing the next chapter in the Mishima bloodline saga

14.12 / 18:23

Ubisoft Racing Twitter The Crew will be unplayable on PC from April, according to its Steam page

14.12 / 18:21

NetEase Once Human Impressions - Fallout 76 Meets The Secret World in This Awesome Upcoming Survival MMO

14.12 / 18:21

Platform Ubisoft Adventure Prince of Persia: The Lost Crown Gets Five Minute Gameplay Overview

14.12 / 18:19

The Zone of Interest breaks dangerous new ground in exploring the Holocaust

14.12 / 18:19

Dreams Booking Percy Jackson and the Olympians is incredibly by the book

14.12 / 18:19

Puzzle fishing Nintendo Life sim Palia feels right at home on Nintendo Switch

14.12 / 18:19

Synopsys Hideo Kojima teams up with A24 for his Death Stranding movie

14.12 / 18:13

Mobile Grand Theft Auto: The Trilogy - Definitive Edition Comes To Netflix And Mobile Today

14.12 / 18:03

Action Adventure Marvell Arkane release new Marvel's Blade concept artworks, hinting at missions, combat and levels

14.12 / 18:03

RPG PC Strategy Indie Eiyuden Chronicle: Hundred Heroes has a six minute trailer and yes, this looks like Suikoden sequel

14.12 / 17:51

RPG CEO Former Day Before dev says game was never an MMO

14.12 / 17:49

Warcraft Provident Archon Feature Launch - Popularity Based Tier Lists and Character Builds by Warcraft Logs

14.12 / 17:45

PC Xbox One Xbox Series X PS4 PS5 Mundfish Atomic Heart: Trapped in Limbo DLC Launches February 6th, 2024

14.12 / 17:45

Sony PS5 Rise of the Ronin Pre-Orders are Now Live

14.12 / 17:45

Ubisoft PC Xbox One PS4 The Crew Delisted on Steam and PlayStation Store; Servers Shutting Down After March 31st, 2024

14.12 / 17:41

Beverly Hills Cop 4 Trailer Previews Eddie Murphy’s Comeback as Axel Foley

14.12 / 17:41

Thor 5: Rogue One Director Gareth Edwards Reacts to MCU Movie Rumors

14.12 / 17:41

Kojima Productions and A24 to adapt live-action Death Stranding movie

14.12 / 17:37

Xbox One PS4 PS5 Nintendo Eiyuden Chronicle: Hundred Heroes ‘Key Features’ trailer

14.12 / 17:33

Adventure Discover Nintendo Pokemon Scarlet and Violet’s The Indigo Disk DLC Gets Launch Trailer

14.12 / 17:33

Platform Marvell Sony Marvel’s Blade Has New Concept Art Available

14.12 / 17:33

Nintendo Could Hogwarts Legacy Break Huge Gaming Streak In The US?

14.12 / 17:33

Twitter Another Bethesda Team Member Strikes Out At Starfield “Haters”

14.12 / 17:29

AMD Apple Intel Meteor Lake “Core Ultra” CPUs Launched: The First Chiplet Design With Next-Gen CPU Cores, Arc GPU & NPU For The AI PC Revolution

14.12 / 17:23

Mobile Google How to easily find the GTA trilogy (and other games) on Netflix

14.12 / 17:21

Google MMORPG Idle Clans Adds More Bling And New Events In Latest Update

14.12 / 17:21

RPG Google RADS Echocalypse: Scarlet Covenant Launches Today Globally

14.12 / 17:17

RPG Racing Dragon's Dogma 2 has a cat lady who rules an entire nation, and I need this RPG in my life

14.12 / 17:17

Mobile RPG Dreams Twitter DiRT TikTok Finally, a shopkeeping sim where I can live out my dreams as a goblin merchant in an RPG like Baldur's Gate 3

14.12 / 17:17

LEGO I'm amazed that one of the best Lego Star Wars kits is still at its lowest ever price

14.12 / 17:17

Marvell Metro Cyberpunk Marvel's Blade concept art teases a club mission and I hope it has blood showers

14.12 / 17:17

Gaming Tekken 8 You Can Try One Of 2024's Most Anticipated Games This Week For Free On PS5

14.12 / 17:13

Minecraft fishing Kingdom and Castles Review

14.12 / 17:12

NVIDIA AMD Celebrity AMD Publishes Full FSR 3 Source Code For DX12 & Unreal Engine 5, Making Integration In Games Easier For Devs

14.12 / 17:12

Digital Provident Avatar: Frontiers of Pandora Offers Full RT on Consoles, XSX Sharper, PS5 Has Other Benefits

14.12 / 17:12

Sony PlayStation 5 Pro Zen 2 CPU Is the Most Logical Choice; Console Should Cost Around $100 More Than the PS5 Slim

14.12 / 17:12

Fighting Strategy Progressive Tekken 8 Q&A with Game Director Kohei Ikeda – Tekken Ball Online Confirmed

14.12 / 17:12

Extreme Sony AMD The PlayStation 5 Pro May Be a Very Developer Friendly System, Requiring Little Effort to Get Better Game Performance

14.12 / 17:12

CEO Google Software Epic Games Store Is the Untold Success Story with 80 Million MAUs, Says Sweeney; It’s Catching Steam Fast

14.12 / 17:12

Ubisoft Action RPG Strategy Might & Magic: Fates Rumored to Be Ubisoft Shanghai’s Open World Reboot of the RPG Series

14.12 / 17:12

Destiny Destiny 2: Legacy Collection Is Free on the Epic Games Store; Holiday Sale Now Available with Huge Discounts

14.12 / 17:12

Racing Minecraft Extreme NVIDIA GeForce NOW Adds Forza Horizon 4 and 5, Minecraft Dungeons and More

14.12 / 17:12

Party Action RPG Eiyuden Chronicle Hundred Heroes Hands-On Preview – First Steps Towards Greatness

14.12 / 17:11

Marvel’s Blade Announcement Trailer Reupload Suggests Xbox Exclusivity

14.12 / 17:11

RPG Starfield Lead Designer Discusses How Much Some Players Are Disconnected From Game Development

14.12 / 17:11

RPG Progressive Nintendo First Super Mario RPG Update for Switch Addresses Several Progression Issues

14.12 / 17:11

RPG AMD Nintendo Yuzu Switch Emulator New Builds Reduce Boot Times In Zelda: Tears of the Kingdom, Fix Slowdowns In All Games

14.12 / 17:11

Provident Suicide Squad: Kill the Justice League Leak Teases Deathstroke and More Names, Several Modes

14.12 / 17:11

Battlefield Digital Extreme shooting Ready or Not Launches Out of Early Access, Adding Lots of Missions and Content

14.12 / 17:11

Adventure Nintendo The Legend of Zelda: Link’s Awakening DX Native PC Port Offers High Resolution and Frame Rate Support

14.12 / 17:11

Marvell Sony A Spider-Verse Game Could Be in the Works by Insomniac

14.12 / 17:11

Party Provident SpaceX Loses $885 Million Starlink FCC Bid As Commission Cites Starship Problems

14.12 / 17:11

AMD Avatar: Frontiers of Pandora New Disk Cache Enabler Mod Brings Improved Performance

14.12 / 17:11

CEO Software E3 is Finally, Officially, Permanently Dead After a Long Decline

14.12 / 17:11

RPG Digital Software Star Wars Jedi Call of Duty: Modern Warfare III Tops the November Charts, but Hogwarts is Still #1 in 2023

14.12 / 17:11

Sony PS Plus Extra/Premium Adds GTA V, Stranger of Paradise, Mega Man Classics, More in December

14.12 / 17:11

NVIDIA AMD Reddit Starfield to Get AMD FSR 3 and Intel XeSS in Early 2024; City Maps, New Travel Methods and Mod Support Coming

14.12 / 17:11

Bloodlines 2 Dev Talks UE5 Features: Nanite’s “Seismic Shift” in Geometry, Lumen, and Virtual Shadow Maps

14.12 / 17:11

Platform Ubisoft Adventure Extreme Prince of Persia: The Lost Crown Hands-On Preview – Metroidvania Royalty

14.12 / 17:11

NVIDIA NVIDIA GeForce RTX 4070 SUPER Hits Shelves On 17th January, 4070 Ti SUPER On 24th & 4080 SUPER On 31st January

14.12 / 17:11

Digital Possible GTA 6 Ending Shown Off in Impressive New UE5-Powered Concept Trailer

14.12 / 17:11

Dreams Provident Empower Your Christmas: BLUETTI’s Portable Home Power Backup Stations Make The Perfect Gift

14.12 / 17:11

Fallout Twitter Marvell Marvel’s Spider-Man 2 New Game+ and Fallout 4’s Next Gen Update Are Delayed to 2024

14.12 / 17:10

PlayStation 5 PS5 Spider-Man 2 New Game Plus Update Delayed to Early 2024 With Promise of More Features

14.12 / 17:10

Apple Apple Announces App Store’s Top Apps and Games of 2023: WhatsApp, JioCinema, BGMI, and More

14.12 / 17:10

Oppo Find N3 Flip Review: Building on Basics

14.12 / 17:10

Provident Xbox Cloud Gaming now available on Meta Quest 2, 3 and Pro

14.12 / 17:10

Gaming Is Ark: Survival Ascended Worth Playing If You Enjoyed Evolved?

14.12 / 17:09

Resident Evil Gaming Resident Evil 4 Remake 10 Best Treasure Combinations In Resident Evil 4 Remake

14.12 / 17:09

PlayStation 4 PlayStation 5 Sony PS4 PS5 GTA 5 and Stranger of Paradise: Final Fantasy Origin Lead PlayStation Plus Extra, Deluxe Games for December 2023

14.12 / 17:09

Nintendo E3 Is Officially Dead After a Series of Failed Attempts at Reinvention

14.12 / 17:09

Mobile Digital Google Apple The regulation is coming from inside the house!

14.12 / 17:09

Platform Action Provident Celebrity IGN and TCM Superstream Announce Fear Fest Re-Animated for March 2024

14.12 / 17:09

Platform Twitter Mundfish Atomic Heart's Trapped Limbo DLC Gets New Trailer and Release Date

14.12 / 17:09

Progressive ArcheAge Adds New Mythic-Grade Boss, the Battle of Mistmarrow Returns, and Twitch Drops for the Holidays

14.12 / 17:08

RPG PS4 PS5 Nintendo Unicorn Overlord details more ally characters, social activities

14.12 / 17:07

RPG PS5 Rise of the Ronin pre-orders now available

14.12 / 17:04

Xbox One PS4 PS5 shooter Atomic Heart DLC ‘Trapped in Limbo’ launches February 6, 2024

Elon Musk’s Grok AI Eviscerates Every Other Model in Answering Held-Out Math Questions Except GPT-4

Related News