EurekaClaw: AI Research Agent for Theorem Discovery
EurekaClaw is a multi-agent AI research assistant designed for mathematical and theoretical research. Released by researchers from UCLA's Artificial General Intelligence Lab in March 2026, it transforms natural language questions into publishable research artifacts — crawling literature, generating and proving theorems, and writing LaTeX papers.
The repository has accumulated 294 stars and 25 forks within days of release, with active daily commits from two core contributors. The codebase demonstrates production engineering with comprehensive documentation, a React-based browser UI, and a modular architecture supporting custom skills and domain plugins.
From Question to Publication
The research workflow in EurekaClaw follows a seven-stage pipeline. A user provides a conjecture or research topic. The system then:
- Crawls arXiv and Semantic Scholar for relevant literature
- Summarizes and cross-references papers for context
- Generates novel hypotheses by synthesizing patterns across the literature
- Builds proofs through a bottom-up theorem-proving pipeline
- Verifies correctness via Lean4 formalization or LLM peer review
- Runs numerical experiments to validate theoretical bounds (optional)
- Produces camera-ready LaTeX with theorem environments and citations
This pipeline operates autonomously once initiated, with optional human-in-the-loop checkpoints controlled via the GATE_MODE environment variable.
Multi-Agent Architecture
EurekaClaw implements a multi-agent system where specialized agents handle distinct stages:
| Agent | Function | Tools Used |
|---|---|---|
| Literature Crawler | Fetch and summarize papers | arXiv API, Semantic Scholar |
| Idea Generator | Synthesize hypotheses from patterns | Embedding models, LLM reasoning |
| Theorem Prover | Generate and verify proofs | Lean4, symbolic verification |
| Paper Writer | Draft LaTeX manuscripts | LaTeX compiler, citation management |
| Experiment Runner | Validate bounds numerically | WolframAlpha, Python execution |
Agents communicate through a KnowledgeBus that persists context across the pipeline. Each agent can access the outputs of previous stages while maintaining isolation for their specific tasks.
The 7-Stage Theorem Pipeline
The core innovation in EurekaClaw is its structured approach to automated theorem proving:
Stage 1: Literature Synthesis — Agents read and summarize relevant papers, extracting key lemmas, definitions, and proof techniques.
Stage 2: Hypothesis Generation — Pattern matching across the literature identifies gaps and suggests novel conjectures.
Stage 3: Lemma Decomposition — Complex theorems are broken into sub-lemmas with explicit dependencies.
Stage 4: Proof Sketching — High-level proof strategies are selected based on the literature and problem structure.
Stage 5: Formal Verification — Proofs are translated to Lean4 for machine-checked correctness.
Stage 6: Peer Review Simulation — LLM-based reviewers critique proofs for soundness and clarity.
Stage 7: LaTeX Generation — Camera-ready papers with proper theorem environments, citations, and bibliography.
Continual Learning via Skills
EurekaClaw implements a skill system that improves performance over time. After each research session, successful proof strategies are distilled into reusable skills stored in ~/.eurekaclaw/skills/. These skills are Markdown files containing:
- Trigger conditions (when to apply this strategy)
- Workflow descriptions (step-by-step process)
- Parameters and configuration options
- Examples of successful applications
The THEORY_PIPELINE configuration can switch between "default" and "memory_guided" modes. In memory_guided mode, previously learned skills influence hypothesis generation and proof strategy selection.
Three Levels of Research Depth
EurekaClaw provides three entry points depending on research maturity:
| Command | Level | Use Case |
|---|---|---|
eurekaclaw prove "conjecture" | 1 | Precise mathematical statement to prove |
eurekaclaw from-papers <ids> | 2 | Extend or find gaps in specific papers |
eurekaclaw explore "domain" | 3 | Broad research area without specific conjecture |
Level 1 assumes the user knows what they want to prove. Level 2 starts from existing work. Level 3 is exploratory, suitable for early-stage research ideation.
Scientist-Bench Evaluation
EurekaClaw includes a built-in evaluation framework called Scientist-Bench that scores research outputs across five dimensions:
| Dimension | Weight | Method |
|---|---|---|
| Formal correctness | 0.35 | Lean4 / LLM peer review |
| Novelty | 0.25 | Embedding distance from known results |
| Experimental alignment | 0.15 | Numerical validation |
| Proof depth | 0.15 | Lemma count and complexity |
| Citation coverage | 0.10 | Bibliography completeness |
This evaluation runs automatically at session completion via eurekaclaw eval-session <session_id>, providing quantitative feedback on research quality.
Local-First Architecture
EurekaClaw emphasizes privacy and local execution. The system runs entirely on the user's machine with no cloud dependencies beyond LLM API calls. Data persists to local directories:
~/.eurekaclaw/
├── skills/ # Learned strategies
├── memory/ # Episodic and persistent storage
├── results/ # Generated papers and proofs
└── config.yaml # User preferences
The browser UI (React + TypeScript) provides visualization without compromising the local-first principle — it connects to the local backend rather than a remote service.
Configuration and Gate Modes
EurekaClaw implements configurable safety checkpoints via GATE_MODE:
- none: Fully autonomous execution
- auto: System requests human approval at critical decisions
- human: Human-in-the-loop for all major stages
This addresses the fundamental tension in AI research tools: full autonomy risks error propagation, while excessive oversight negates the productivity benefits. EurekaClaw lets users calibrate this trade-off.
Tool Ecosystem
EurekaClaw integrates with multiple external tools for verification:
- arXiv API: Paper retrieval and metadata
- Semantic Scholar: Citation networks and paper relationships
- Lean4: Formal proof verification
- WolframAlpha: Symbolic computation
- Python execution: Numerical experiments and plotting
Each tool is implemented as a modular component that can be extended or replaced. Custom tools subclass BaseTool and register with the agent system.
Comparison with DeerFlow
Both EurekaClaw and DeerFlow pursue multi-agent research automation but with different emphases:
| Aspect | EurekaClaw | DeerFlow |
|---|---|---|
| Primary domain | Mathematical / theoretical research | General agent tasks |
| Output format | LaTeX papers with proofs | Reports, slides, code |
| Verification | Lean4 formalization | Sandboxed execution |
| Entry points | 3 levels (prove/explore/from-papers) | Skills-based |
| Learning | Skill distillation after sessions | Long-term memory |
| Evaluation | Scientist-Bench (5 dimensions) | User judgment |
EurekaClaw targets the formal research workflow with specific emphasis on correctness verification. DeerFlow targets general agent execution with broader tooling.
Technical Assessment
Strengths:
- Structured theorem-proving pipeline with formal verification
- Continual learning through skill distillation
- Built-in evaluation framework (Scientist-Bench)
- Local-first privacy architecture
- Three entry points for different research maturity levels
- Comprehensive documentation and browser UI
Considerations:
- 294 stars vs DeerFlow's 38,000 — newer, less battle-tested
- Only 2 contributors vs DeerFlow's 147
- Requires Anthropic API (Claude) — no model flexibility yet
- Experiment Runner marked "under development"
- Windows support incomplete (WSL recommended)
Conclusion
EurekaClaw represents a focused approach to AI-assisted mathematical research. The 7-stage theorem pipeline, Lean4 verification integration, and Scientist-Bench evaluation demonstrate serious attention to research rigor. The skill-based continual learning system addresses the genuine problem of accumulating domain expertise across sessions.
Use EurekaClaw when you need formal theorem generation with verification, LaTeX paper production, and structured research workflows. Consider DeerFlow when you need general agent capabilities with broader tooling and multi-channel integration.
The tool is Apache 2.0 licensed and actively maintained. For researchers in theoretical computer science, machine learning theory, and applied mathematics, it provides a concrete foundation for AI-assisted research workflows.
Quick Facts
| Metric | Value |
|---|---|
| GitHub Stars | 294 |
| Forks | 25 |
| Contributors | 2 (UCLA AGI Lab) |
| License | Apache 2.0 |
| Primary Language | Python 54.8% |
| Release Date | March 2026 |
| Repository | github.com/EurekaClaw/EurekaClaw |
| Website | eurekaclaw.ai |
Core Features
| Feature | Description |
|---|---|
| Literature Crawler | Fetch/summarize arXiv and Semantic Scholar papers |
| Idea Generator | Synthesize hypotheses from literature patterns |
| Theorem Prover | 7-stage bottom-up proof pipeline |
| Paper Writer | LaTeX generation with theorem environments |
| Experiment Runner | Numerical validation (in development) |
| Continual Learning | Skill distillation after each session |
| Browser UI | React + TypeScript live tracking interface |
7-Stage Pipeline
| Stage | Function |
|---|---|
| 1 | Literature synthesis — read and summarize papers |
| 2 | Hypothesis generation — identify gaps |
| 3 | Lemma decomposition — break into sub-lemmas |
| 4 | Proof sketching — high-level strategy |
| 5 | Formal verification — Lean4 translation |
| 6 | Peer review simulation — LLM critique |
| 7 | LaTeX generation — camera-ready paper |
Three Research Levels
| Command | Level | Use Case |
|---|---|---|
prove "conjecture" | 1 | Specific theorem to prove |
from-papers <ids> | 2 | Extend existing papers |
explore "domain" | 3 | Broad research exploration |
Gate Modes (Safety Checkpoints)
| Mode | Behavior |
|---|---|
none | Fully autonomous |
auto | Approvals at critical decisions |
human | Human-in-the-loop throughout |
Scientist-Bench Scoring
| Dimension | Weight | Method |
|---|---|---|
| Formal correctness | 35% | Lean4 / LLM review |
| Novelty | 25% | Embedding distance |
| Experimental alignment | 15% | Numerical validation |
| Proof depth | 15% | Lemma count |
| Citation coverage | 10% | Bibliography |
Requirements
- Python 3.11+
- Anthropic API key (Claude)
- macOS / Linux (Windows via WSL)
Quick Start
curl -fsSL https://eurekaclaw.ai/install.sh | bash
eurekaclaw onboard # Interactive setup
eurekaclaw install-skills # Install proof skills
eurekaclaw prove "Your conjecture here"
eurekaclaw explore "ML theory"
eurekaclaw from-papers 1706.03762 2005.14165
EurekaClaw vs DeerFlow
| Aspect | EurekaClaw | DeerFlow |
|---|---|---|
| Domain | Math / theory | General tasks |
| Output | LaTeX papers | Reports/code |
| Verification | Lean4 | Sandbox |
| Model | Claude only | Multi-provider |
| Learning | Skill distillation | Long-term memory |
EurekaClaw for Builders
Implementation patterns for extending EurekaClaw and building research workflows.
Configuration Pattern
# ~/.eurekaclaw/config.yaml or .env
# LLM Backend
ANTHROPIC_API_KEY=sk-... # Required
EUREKACLAW_MODEL=claude-sonnet-4-6 # Main reasoning model
# Pipeline Control
GATE_MODE=auto # none | auto | human
THEORY_PIPELINE=default # default | memory_guided
THEORY_MAX_ITERATIONS=10
# Output Format
OUTPUT_FORMAT=latex # latex | markdown
EXPERIMENT_MODE=auto # auto | true | false
Custom Skill Definition
# ~/.eurekaclaw/skills/my-strategy.md
---
name: topological-proof
version: 1.0.0
author: your-name
triggers:
- "topological"
- "filtration"
- "homotopy"
---
# Topological Proof Strategy
**Purpose:** Apply topological methods to continuous optimization problems.
## When to Use
- Problem involves continuous functions
- Can construct a filtration
- Topological invariants apply
## Workflow
1. Construct filtration from problem structure
2. Compute persistence homology
3. Relate topological features to optima
4. Prove bounds via topological properties
## Example Success
"O(n log n) via topological filtration" — sparse attention complexity
Domain Plugin Architecture
# eurekaclaw/domains/my_domain.py
from eurekaclaw.domains.base import DomainPlugin
class MyDomain(DomainPlugin):
name = "my-domain"
def relevant_papers(self, query: str) -> list:
"""Return arXiv categories or search terms."""
return ["cs.LG", "cs.AI"]
def validate_hypothesis(self, hypothesis: dict) -> bool:
"""Domain-specific validation logic."""
return self.check_theoretical_sounds(hypothesis)
def proof_templates(self) -> list:
"""Return common proof structures for this domain."""
return ["contradiction", "induction", "probabilistic"]
Python API Usage
from eurekaclaw import EurekaSession
from eurekaclaw.memory import KnowledgeBus
# Initialize session
session = EurekaSession(
model="claude-sonnet-4-6",
gate_mode="auto",
output_format="latex"
)
# Run research pipeline
result = session.prove(
conjecture="Sample complexity is O(L·d·log(d)/ε²)",
domain="ML theory"
)
# Access results
print(result.proof)
print(result.latex_paper)
print(result.evaluation_score) # Scientist-Bench
# Memory management
bus = KnowledgeBus()
bus.remember("learned-strategy", result.proof_techniques)
Tool Integration Pattern
from eurekaclaw.tools.base import BaseTool
import requests
class MyCustomTool(BaseTool):
name = "custom-verifier"
description = "Verify claims using external service"
def __init__(self, api_key: str):
self.api_key = api_key
def _run(self, claim: str) -> str:
"""Execute tool logic."""
response = requests.post(
"https://api.verifier.com/check",
json={"claim": claim},
headers={"Authorization": f"Bearer {self.api_key}"}
)
return response.json()["result"]
async def _arun(self, claim: str) -> str:
"""Async version for concurrent execution."""
return await self._async_check(claim)
# Register with agent
from eurekaclaw.agents import ProverAgent
agent = ProverAgent(tools=[MyCustomTool(api_key)])
Session Evaluation
# Evaluate completed research
from eurekaclaw.evaluation import ScientistBench
bench = ScientistBench()
scores = bench.evaluate_session(
session_id="sess-2026-03-23-001",
dimensions=["correctness", "novelty", "depth"]
)
print(f"Overall: {scores.weighted_total}")
print(f"Correctness: {scores.correctness}")
print(f"Novelty: {scores.novelty}")
print(f"Citations: {scores.citation_coverage}")
Memory System Tiers
from eurekaclaw.memory import EpisodicMemory, PersistentMemory, KnowledgeGraph
# Episodic: Current session context
episodic = EpisodicMemory()
episodic.add_turn(query="...", response="...")
# Persistent: Cross-session facts
persistent = PersistentMemory()
persistent.remember("user-preference", "topological-methods")
# Knowledge Graph: Relationship structure
kg = KnowledgeGraph()
kg.add_entity("theorem-x", type="theorem")
kg.add_relation("theorem-x", "proves", "lemma-y")
Custom Pipeline Stage
from eurekaclaw.pipeline import PipelineStage
class CustomVerificationStage(PipelineStage):
"""Insert custom verification between existing stages."""
stage_name = "custom-verify"
depends_on = ["proof-sketch"]
outputs = ["verified-proof"]
def execute(self, context: dict) -> dict:
proof = context.get("proof-sketch")
# Custom verification logic
verification = self.custom_checker(proof)
if not verification.valid:
raise ProofError(verification.issues)
return {"verified-proof": proof, "checks": verification.checks}
# Register and run
from eurekaclaw.pipeline import ResearchPipeline
pipeline = ResearchPipeline()
pipeline.insert_after("proof-sketch", CustomVerificationStage())
result = pipeline.run(conjecture="...")
Browser UI Extension
// frontend/src/components/CustomPanel.tsx
import React from 'react';
import { useEurekaSession } from '../hooks/useEureka';
export const CustomPanel: React.FC = () => {
const { session, proofStatus } = useEurekaSession();
return (
Proof Progress
{proofStatus.stages.map(stage => (
))}
);
};
Deployment Checklist
- [ ] Install:
curl -fsSL https://eurekaclaw.ai/install.sh | bash - [ ] Configure:
eurekaclaw onboardor manual.env - [ ] Install skills:
eurekaclaw install-skills - [ ] Test pipeline:
eurekaclaw prove "simple test" - [ ] Build UI:
make open - [ ] Add custom skills to
~/.eurekaclaw/skills/ - [ ] Test domain plugin (if extending)
- [ ] Verify Lean4 installation for formal proofs
- [ ] Run evaluation:
eurekaclaw eval-session <id>
Technical Appendix
Repository: github.com/EurekaClaw/EurekaClaw
Version analyzed: March 23, 2026 (commit 71b356c)
Stars: 294 | Forks: 25 | Contributors: 2
License: Apache 2.0
Key Files Referenced:
README.md, docs/configuration.md, docs/architecture.md
docs/agents.md, docs/skills.md, docs/memory.md
Website: eurekaclaw.ai
Documentation: eurekaclaw.github.io
Authors: Xuheng Li, Qiwei Di, Chenggong Zhang, Kaixuan Ji, Qingyue Zhao, Yifeng Liu, Shiyuan Zhang, Quanquan Gu (UCLA AGI Lab)