Home Blog Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

5
0


Epoch AI, a California-based analysis institute launched a brand new synthetic intelligence (AI) benchmark final week. Dubbed FrontierMath, the brand new AI benchmark exams massive language fashions (LLMs) on their functionality of reseasoning and mathematical problem-solving. The AI agency claims that current math benchmarks are usually not very helpful resulting from components like information contamination and AI fashions scoring very excessive scores on them. Epoch AI claims that even the main LLMs have scored lower than two % on the brand new benchmark.

Epoch AI Launches FrontierMath Benchmark

In a post on X (previously often known as Twitter), the AI agency defined that it collaborated with greater than 60 mathematicians to create tons of of origins and unpublished math issues. Epoch AI claims that these questions would take even mathematicians hours to resolve. The motive behind creating the brand new benchmark was cited as the constraints with current benchmarks similar to GSM8K and MATH, the place AI fashions typically rating a excessive level.

The firm claimed that the excessive scores achieved by LLMs are largely resulting from information contamination. This means the questions one way or the other have been already fed into the AI fashions, leading to them simply fixing the questions.

FrontierMath solves the issue by together with new issues which might be distinctive and haven’t been revealed wherever, mitigating the dangers related to information contamination. Further, the benchmark consists of a variety of questions together with computationally intensive issues in quantity idea, actual evaluation, and algebraic geometry, in addition to matters similar to Zermelo–Fraenkel set idea. The AI agency says all of the questions are “guess proof”, that means they can’t be solved by chance with out sturdy reasoning.

Epoch AI highlighted that to measure AI’s aptitude, benchmarks must be created on artistic problem-solving the place the AI has to take care of reasoning over a number of steps. Notably, many trade veterans imagine that the present benchmarks are usually not adequate to accurately measure how superior an AI mannequin is.

Responding to the brand new benchmark in a post, Noam Brown, an OpenAI researcher who was behind the corporate’s o1 mannequin welcomed the brand new benchmark and stated, “I like seeing a brand new eval with such low go charges for frontier fashions.”

For the newest tech information and evaluations, comply with Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the newest movies on devices and tech, subscribe to our YouTube channel. If you need to know all the pieces about prime influencers, comply with our in-house Who’sThat360 on Instagram and YouTube.

Poco X7 Pro Could Be the First Smartphone to Ship With Xiaomi’s HyperOS 2 in India


iQOO 13 Colour Options Revealed Ahead of Launch in India on December 3





Leave a Reply