First Proof Challenge Responses
The First Proof Challenge is a benchmark created by a group of mathematicians, designed to test the frontier of AI-assisted mathematical reasoning. The challenge posed a series of research-caliber mathematical problems (spanning number theory, combinatorics, analysis, and algebra), intended to be solvable via model prompting within roughly a week.
- We produced the first known comprehensive comparison of frontier model responses across the full problem set, systematically evaluating how different large language models approach, formalize, and attempt to solve research-level mathematics.
- Our analysis documents the strategies that succeed and the characteristic failure modes that emerge across model families, providing a structured account of where current AI reasoning stands relative to genuine mathematical research.
- The work was conducted in collaboration with Param Thakkar and constitutes, to our knowledge, the first public, systematic benchmark-level comparative evaluation of LLM performance on the First Proof problem set.
A preprint of this work is available here.