frontiermath news - Search News

Beyond Simple Math, AI Hits a Wall—FrontierMath Shows Where It’s Stuck

A new benchmark called FrontierMath is exposing how artificial intelligence still has a long way to go when it comes to ...

A new math benchmark just dropped and leading AI models can solve 'less than 2%' of its problems... oh dear

While today's AI models don't tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to Epoch ...

New secret math benchmark stumps AI models and PhDs alike

FrontierMath's performance results, revealed in a preprint research paper, paint a stark picture of current AI model ...

Topic: research papers

FrontierMath's difficult questions remain unpublished so that AI companies can't train against it. FrontierMath's difficult ...

Testing AI systems on hard math problems shows they still perform very poorly

A team of AI researchers and mathematicians affiliated with several institutions in the U.S. and the U.K. has developed a ...

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

Epoch AI highlighted that to measure AI's aptitude, benchmarks should be created on creative problem-solving where the AI has ...

Sam Altman claims AGI is coming in 2025 and machines will be able to 'think like humans' when it happens

AGI is a form of AI that is as capable as, if not more capable than, all humans across almost all areas of intelligence. It has been the ‘holy grail’ for every major AI lab, and many predicted it ...

AI’s math problem: FrontierMath benchmark shows how far technology still has to go

FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.

AI groups rush to redesign model testing and create new benchmarks

Companies conduct “evaluations” of AI models by teams of staff and outside researchers. These are standardised tests, known as benchmarks, that assess models’ abilities and the performance of ...

Analytics India Magazine3d

OpenAI o1 Can’t Do Maths, But Excels at Making Excuses

It’s not just OpenAI’s o1—no LLM in the world is anywhere close to cracking the toughest problems in mathematics (yet).

marktechpost18d

Meta AI Silently Releases NotebookLlama: An Open Version of Google’s NotebookLM

Meta has recently released NotebookLlama, an open version of Google’s NotebookLM that empowers researchers and developers with accessible, scalable solutions for interactive data analysis and ...

gadgets36020d

Oppo Pad 3 Pro With 144Hz Display, Snapdragon 8 Gen 3 Chipset Launched: Specifications, Price

Oppo Pad 3 Pro is powered by Snapdragon 8 Gen 3 chipset The flagship tablet is available for purchase in China in two colourways It has a single 13-megapixel rear camera and an 8-megapixel front ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results