← Back to Home

Blog

Notes on AI, mathematics, and formal verification

ArXiv Math Semantic Search

January 12, 2026

AI-powered semantic search and Q&A over 700,000+ arXiv mathematics papers. Ask questions, search by concept, explore connections. Free MVP live now.

Live MVP RAG pgvector

Polya-Szego + Aristotle: From 2.8% to 100% Verified Proofs

January 12, 2026

Using Aristotle to automatically verify LLM-generated Lean proofs from Polya-Szego. 80/80 submissions verified successfully (100% success rate). The "30x gap" between informal reasoning and formal verification can be bridged with the right tools.

Aristotle Lean4 Formal Verification

Polya-Szego Evaluation: LLM Reasoning vs Formal Verification

January 2, 2026

Evaluating frontier LLMs on 320 problems from Polya-Szego's classic text. Key finding: 95.4% informal reasoning accuracy but only 2.8% complete Lean4 proofs. The bottleneck in AI mathematics is formalization, not understanding.

DeepSeek-Prover Mathematical AI Benchmarking

Math OCR Benchmark: Why Gemini Flash Beats Mathpix

January 1, 2026

Benchmarking Mathpix against LLM-based OCR for mathematical PDFs. Gemini 3 Flash is 6x cheaper and more accurate for extracting equations from academic papers and textbooks.

OCR Gemini LaTeX