UPS Gap awards Analysis Research experts

A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear

pcgamer.com

Sometimes I forget there's a whole other world out there where AI models aren't just used for basic tasks such as simple research and quick content summaries.

Out in the land of bigwigs, they're instead being used to help with everything from financial analysis to scientific research.

That's why their mathematical capabilities are so important—plus it's a general marker of reasoning capabilities. Which is why mathematical benchmarks exist.

Benchmarks such as FrontierMath, which its maker, Epoch AI, has just dropped and which is putting LLMs through their paces with «hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems» (via Ars Technica).

While today's AI models don't tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to Epoch AI, «they solve less than 2% of FrontierMath problems, revealing a substantial gap between current AI capabilities and the collective prowess of the mathematics community».

Все новости дня

This page might use cookies if your analytics vendor requires them.