While today's AI models don't tend to struggle with other mathematical benchmarks such as GSM-8k and MATH, according to Epoch ...