ArXiv Math Semantic Search
AI-powered semantic search and Q&A over 700,000+ arXiv mathematics papers. Ask questions, search by concept, explore connections. Free MVP live now.
Notes on AI, mathematics, and formal verification
AI-powered semantic search and Q&A over 700,000+ arXiv mathematics papers. Ask questions, search by concept, explore connections. Free MVP live now.
Using Aristotle to automatically verify LLM-generated Lean proofs from Polya-Szego. 80/80 submissions verified successfully (100% success rate). The "30x gap" between informal reasoning and formal verification can be bridged with the right tools.
Evaluating frontier LLMs on 320 problems from Polya-Szego's classic text. Key finding: 95.4% informal reasoning accuracy but only 2.8% complete Lean4 proofs. The bottleneck in AI mathematics is formalization, not understanding.
Benchmarking Mathpix against LLM-based OCR for mathematical PDFs. Gemini 3 Flash is 6x cheaper and more accurate for extracting equations from academic papers and textbooks.