ArXiv Math Semantic Search
AI-powered semantic search and Q&A over 700,000+ arXiv mathematics papers. Ask natural language questions, search by concept, explore author connections. Free MVP available at igor.ngrok.pro.
Technical reports on AI, mathematics, and formal verification
AI-powered semantic search and Q&A over 700,000+ arXiv mathematics papers. Ask natural language questions, search by concept, explore author connections. Free MVP available at igor.ngrok.pro.
Follow-up to our initial benchmark: using Aristotle to automatically verify LLM-generated Lean proofs. 80/80 submissions verified successfully (100% success rate). Aristotle corrects formalization errors and synthesizes complete, machine-checkable proofs. Data release: github.com/igorrivin/polya-szego-lean
Comprehensive evaluation of frontier LLMs on 320 problems from Polya-Szego's "Problems and Theorems in Analysis I" (1925). Key finding: 95.4% informal reasoning accuracy but only 2.8% complete Lean4 proofs. The 30x gap reveals that formal verification, not understanding, is the bottleneck in AI mathematics.