← Back to Home

ArXiv Math Semantic Search

AI-Powered Search and Q&A Over 700,000+ Mathematics Papers

January 12, 2026 • Tool Announcement • 5 min read

Try It Now

The MVP is live and free to use. Ask questions about mathematical research, search for papers by concept, or explore connections across the arXiv corpus.

Launch ArXiv Search →
MVP Notice: This is an early preview. The tool is currently free and unauthenticated, but this will change soon as we add user accounts and usage tracking. Feedback welcome!

What Is It?

A semantic search engine over the arXiv mathematics corpus that understands meaning, not just keywords. Ask natural language questions like "What did Rivin prove about hyperbolic geometry?" or "What are recent advances in random matrix theory?" and get relevant papers with AI-synthesized answers.

729K
Papers
290M
Text Chunks
4
LLM Options
<1s
Search Time

How It Works

1. Paper Processing

We extract full LaTeX source from arXiv papers (not just abstracts), chunk them into meaningful segments, and index both the text and mathematical content. This means you can search for specific theorems, definitions, or proof techniques.

2. Hybrid Search

The search pipeline combines two approaches:

  • Semantic search: BGE embeddings + pgvector find conceptually similar content
  • Full-text search: PostgreSQL GIN indexes for exact author/keyword matches

Author queries (e.g., "Tell me about Sarnak's work") automatically use hybrid mode: full-text to find the author's papers, then vector re-ranking for relevance.

3. Multi-LLM Chat

Retrieved chunks are passed to your choice of language model for synthesis:

Example Queries

  • "What are the main results in geometric group theory from 2024?"
  • "Explain the connection between random matrices and number theory"
  • "What did Tao prove about prime gaps?"
  • "Recent advances in machine learning for theorem proving"
  • "Compare different approaches to the Riemann hypothesis"

Technical Stack

  • Database: PostgreSQL + pgvector for vector similarity search
  • Embeddings: BGE-large-en-v1.5 (1024 dimensions)
  • Backend: FastAPI (Python)
  • LLM APIs: Novita (DeepSeek), OpenAI, Anthropic
  • Indexing: Parallel workers with incremental updates

Current Status

The system is actively indexing papers. Current coverage:

Search works on indexed papers now. Full corpus coverage expected within a few weeks.

What's Next


Try ArXiv Search →

Questions or feedback? Get in touch.