Polya-Szego + Aristotle

From 2.8% to 100%: Automated Proof Verification with Aristotle

January 12, 2026 • Research Update • 8 min read

Key Result

Using Aristotle, an automated Lean proof assistant, we achieved 100% verification success on 80 submitted Polya-Szego formalizations. Aristotle automatically corrects formalization errors and synthesizes complete, machine-checkable proofs.

1. Recap: The Formalization Gap

In our initial benchmark, we found that frontier LLMs achieve 95%+ accuracy on informal mathematical reasoning but only 2.8% complete Lean proofs. We called this the "30x gap"—models understand mathematics but cannot produce formally verified proofs.

95%

Informal Reasoning

2.8%

LLM Proofs (Before)

100%

Aristotle Verified

Problems Verified

2. Enter Aristotle

Aristotle is an automated theorem prover that takes informal mathematical statements (or rough Lean formalizations) and produces complete, verified Lean 4 proofs. Unlike traditional proof assistants that require expert guidance, Aristotle operates autonomously.

2.1 Our Pipeline

LLM Formalization: DeepSeek V3.2 generates initial Lean 4 statements from Polya-Szego problems
Aristotle Submission: Each formalization is submitted with problem context
Autonomous Proving: Aristotle corrects errors and synthesizes proofs
Verification: Output is machine-checkable Lean 4 code

2.2 Results

Metric	Value
Problems submitted	80
Successfully verified	80 (100%)
Remaining to submit	238
Total formalizations	318

100% success rate on all 80 submitted problems. Aristotle corrected formalization errors (identifier names, type mismatches) and produced complete proofs.

3. Worked Example: Problem 112

Let's trace a complete example from original problem to verified proof.

3.1 Original Problem (Polya-Szego)

Problem 112 (Part One, Chapter 3):
Prove that $\text{Re}(\bar{a} \cdot e^{i\varphi}) = |a| \cos(\varphi - \arg a)$ for any complex number $a$ and real $\varphi$.

3.2 LLM Formalization (DeepSeek V3.2)

The initial formalization had identifier errors (Complex.abs instead of norm, conj instead of star):

theorem problem_112_part_one :
    forall (a : C) (phi : R),
    Complex.re (conj a * Complex.exp (Complex.I * phi)) =
    Complex.abs a * Real.cos (phi - Complex.arg a) := by
  sorry

3.3 Aristotle Output

Aristotle automatically corrected the identifiers and produced a complete proof:

/-
This file was generated by Aristotle.
Lean version: leanprover/lean4:v4.24.0
Mathlib version: f897ebcf72cd16f89ab4577d0c826cd14afaafc7
-/

import Mathlib

/-
Re(a* * exp(i*phi)) = |a| * cos(phi - arg(a))
-/
theorem problem_112_part_one (a : C) (phi : R) :
    (star a * Complex.exp (Complex.I * phi)).re =
    ||a|| * Real.cos (phi - Complex.arg a) := by
  norm_num [Complex.exp_re, Complex.exp_im, Real.cos_sub]
  rw [<- Complex.norm_mul_cos_arg, <- Complex.norm_mul_sin_arg]
  ring

3.4 What Aristotle Fixed

Before (LLM)

conj a → wrong identifier
Complex.abs a → wrong API
sorry → no proof

After (Aristotle)

star a → correct Mathlib
||a|| → norm notation
Complete 3-line proof

4. More Complex Example: Problem 10

Problem 10 concerns velocity and acceleration in polar coordinates—a multi-part theorem requiring chain rule, product rule, and careful bookkeeping.

Problem 10 (Part Three, Chapter 1):
Let $z = re^{i\theta}$ where $r, \theta$ are functions of time $t$. Compute the radial and tangential components of velocity and acceleration.

Aristotle produced a 110-line verified proof establishing:

Velocity: $v_r = \dot{r}$, $v_\theta = r\dot{\theta}$
Acceleration: $a_r = \ddot{r} - r\dot{\theta}^2$, $a_\theta = r\ddot{\theta} + 2\dot{r}\dot{\theta}$

The proof uses ContDiffAt hypotheses, chain rule for complex exponentials, and careful manipulation of derivatives—exactly the kind of intricate reasoning that LLMs struggle to formalize correctly.

5. Key Insights

5.1 Error Correction is Crucial

Most LLM formalizations have small errors: wrong identifier names, type mismatches, missing imports. Aristotle's ability to automatically correct these errors is what enables 100% success.

5.2 The Bottleneck Has Shifted

Stage	Before Aristotle	After Aristotle
Informal reasoning	95% (solved)	95% (solved)
Formalization	2.8% (bottleneck)	~99% (with Aristotle)
New bottleneck	-	Submission throughput

5.3 Implications for Mathematical AI

LLM + Verifier pipelines work: Imperfect LLM output can be refined by specialized tools
Formal verification is tractable: Classical analysis problems are within reach
The "30x gap" can be bridged: Not by better LLMs alone, but by better tooling

6. Data Release

We are releasing all Aristotle-verified proofs as a resource for the formal mathematics community.

Repository: github.com/igorrivin/polya-szego-lean
Contains: 80 verified Lean 4 proofs, original problem statements, LLM formalizations, and Aristotle outputs.

7. Next Steps

Complete verification: Submit remaining 238 problems to Aristotle
Mathlib contributions: Identify lemmas that should be upstreamed
Broader benchmarks: Apply pipeline to other mathematical texts

Acknowledgments

Thanks to the Aristotle team for their remarkable tool, and to the Mathlib community for building the foundation that makes this work possible.

Report updated January 12, 2026. Verification ongoing—check the repository for latest results.