March 23, 2026 · 10 min read · AI Research Tools

Version: V1 Narrative V2 Scannable V3 Agent/Builder

EurekaClaw: AI Research Agent for Theorem Discovery

EurekaClaw is a multi-agent AI research assistant designed for mathematical and theoretical research. Released by researchers from UCLA's Artificial General Intelligence Lab in March 2026, it transforms natural language questions into publishable research artifacts — crawling literature, generating and proving theorems, and writing LaTeX papers.

The repository has accumulated 294 stars and 25 forks within days of release, with active daily commits from two core contributors. The codebase demonstrates production engineering with comprehensive documentation, a React-based browser UI, and a modular architecture supporting custom skills and domain plugins.

From Question to Publication

The research workflow in EurekaClaw follows a seven-stage pipeline. A user provides a conjecture or research topic. The system then:

Crawls arXiv and Semantic Scholar for relevant literature
Summarizes and cross-references papers for context
Generates novel hypotheses by synthesizing patterns across the literature
Builds proofs through a bottom-up theorem-proving pipeline
Verifies correctness via Lean4 formalization or LLM peer review
Runs numerical experiments to validate theoretical bounds (optional)
Produces camera-ready LaTeX with theorem environments and citations

This pipeline operates autonomously once initiated, with optional human-in-the-loop checkpoints controlled via the GATE_MODE environment variable.

Multi-Agent Architecture

EurekaClaw implements a multi-agent system where specialized agents handle distinct stages:

Agent	Function	Tools Used
Literature Crawler	Fetch and summarize papers	arXiv API, Semantic Scholar
Idea Generator	Synthesize hypotheses from patterns	Embedding models, LLM reasoning
Theorem Prover	Generate and verify proofs	Lean4, symbolic verification
Paper Writer	Draft LaTeX manuscripts	LaTeX compiler, citation management
Experiment Runner	Validate bounds numerically	WolframAlpha, Python execution

Agents communicate through a KnowledgeBus that persists context across the pipeline. Each agent can access the outputs of previous stages while maintaining isolation for their specific tasks.

The 7-Stage Theorem Pipeline

The core innovation in EurekaClaw is its structured approach to automated theorem proving:

Stage 1: Literature Synthesis — Agents read and summarize relevant papers, extracting key lemmas, definitions, and proof techniques.

Stage 2: Hypothesis Generation — Pattern matching across the literature identifies gaps and suggests novel conjectures.

Stage 3: Lemma Decomposition — Complex theorems are broken into sub-lemmas with explicit dependencies.

Stage 4: Proof Sketching — High-level proof strategies are selected based on the literature and problem structure.

Stage 5: Formal Verification — Proofs are translated to Lean4 for machine-checked correctness.

Stage 6: Peer Review Simulation — LLM-based reviewers critique proofs for soundness and clarity.

Stage 7: LaTeX Generation — Camera-ready papers with proper theorem environments, citations, and bibliography.

Continual Learning via Skills

EurekaClaw implements a skill system that improves performance over time. After each research session, successful proof strategies are distilled into reusable skills stored in ~/.eurekaclaw/skills/. These skills are Markdown files containing:

Trigger conditions (when to apply this strategy)
Workflow descriptions (step-by-step process)
Parameters and configuration options
Examples of successful applications

The THEORY_PIPELINE configuration can switch between "default" and "memory_guided" modes. In memory_guided mode, previously learned skills influence hypothesis generation and proof strategy selection.

Three Levels of Research Depth

EurekaClaw provides three entry points depending on research maturity:

Command	Level	Use Case
`eurekaclaw prove "conjecture"`	1	Precise mathematical statement to prove
`eurekaclaw from-papers <ids>`	2	Extend or find gaps in specific papers
`eurekaclaw explore "domain"`	3	Broad research area without specific conjecture

Level 1 assumes the user knows what they want to prove. Level 2 starts from existing work. Level 3 is exploratory, suitable for early-stage research ideation.

Scientist-Bench Evaluation

EurekaClaw includes a built-in evaluation framework called Scientist-Bench that scores research outputs across five dimensions:

Dimension	Weight	Method
Formal correctness	0.35	Lean4 / LLM peer review
Novelty	0.25	Embedding distance from known results
Experimental alignment	0.15	Numerical validation
Proof depth	0.15	Lemma count and complexity
Citation coverage	0.10	Bibliography completeness

This evaluation runs automatically at session completion via eurekaclaw eval-session <session_id>, providing quantitative feedback on research quality.

Local-First Architecture

EurekaClaw emphasizes privacy and local execution. The system runs entirely on the user's machine with no cloud dependencies beyond LLM API calls. Data persists to local directories:

~/.eurekaclaw/
├── skills/           # Learned strategies
├── memory/           # Episodic and persistent storage
├── results/          # Generated papers and proofs
└── config.yaml       # User preferences

The browser UI (React + TypeScript) provides visualization without compromising the local-first principle — it connects to the local backend rather than a remote service.

Configuration and Gate Modes

EurekaClaw implements configurable safety checkpoints via GATE_MODE:

none: Fully autonomous execution
auto: System requests human approval at critical decisions
human: Human-in-the-loop for all major stages

This addresses the fundamental tension in AI research tools: full autonomy risks error propagation, while excessive oversight negates the productivity benefits. EurekaClaw lets users calibrate this trade-off.

Tool Ecosystem

EurekaClaw integrates with multiple external tools for verification:

arXiv API: Paper retrieval and metadata
Semantic Scholar: Citation networks and paper relationships
Lean4: Formal proof verification
WolframAlpha: Symbolic computation
Python execution: Numerical experiments and plotting

Each tool is implemented as a modular component that can be extended or replaced. Custom tools subclass BaseTool and register with the agent system.

Comparison with DeerFlow

Both EurekaClaw and DeerFlow pursue multi-agent research automation but with different emphases:

Aspect	EurekaClaw	DeerFlow
Primary domain	Mathematical / theoretical research	General agent tasks
Output format	LaTeX papers with proofs	Reports, slides, code
Verification	Lean4 formalization	Sandboxed execution
Entry points	3 levels (prove/explore/from-papers)	Skills-based
Learning	Skill distillation after sessions	Long-term memory
Evaluation	Scientist-Bench (5 dimensions)	User judgment

EurekaClaw targets the formal research workflow with specific emphasis on correctness verification. DeerFlow targets general agent execution with broader tooling.

Technical Assessment

Strengths:

Structured theorem-proving pipeline with formal verification
Continual learning through skill distillation
Built-in evaluation framework (Scientist-Bench)
Local-first privacy architecture
Three entry points for different research maturity levels
Comprehensive documentation and browser UI

Considerations:

294 stars vs DeerFlow's 38,000 — newer, less battle-tested
Only 2 contributors vs DeerFlow's 147
Requires Anthropic API (Claude) — no model flexibility yet
Experiment Runner marked "under development"
Windows support incomplete (WSL recommended)

Conclusion

EurekaClaw represents a focused approach to AI-assisted mathematical research. The 7-stage theorem pipeline, Lean4 verification integration, and Scientist-Bench evaluation demonstrate serious attention to research rigor. The skill-based continual learning system addresses the genuine problem of accumulating domain expertise across sessions.

Use EurekaClaw when you need formal theorem generation with verification, LaTeX paper production, and structured research workflows. Consider DeerFlow when you need general agent capabilities with broader tooling and multi-channel integration.

The tool is Apache 2.0 licensed and actively maintained. For researchers in theoretical computer science, machine learning theory, and applied mathematics, it provides a concrete foundation for AI-assisted research workflows.

Quick Facts

Metric	Value
GitHub Stars	294
Forks	25
Contributors	2 (UCLA AGI Lab)
License	Apache 2.0
Primary Language	Python 54.8%
Release Date	March 2026
Repository	github.com/EurekaClaw/EurekaClaw
Website	eurekaclaw.ai

Core Features

Feature	Description
Literature Crawler	Fetch/summarize arXiv and Semantic Scholar papers
Idea Generator	Synthesize hypotheses from literature patterns
Theorem Prover	7-stage bottom-up proof pipeline
Paper Writer	LaTeX generation with theorem environments
Experiment Runner	Numerical validation (in development)
Continual Learning	Skill distillation after each session
Browser UI	React + TypeScript live tracking interface

7-Stage Pipeline

Stage	Function
1	Literature synthesis — read and summarize papers
2	Hypothesis generation — identify gaps
3	Lemma decomposition — break into sub-lemmas
4	Proof sketching — high-level strategy
5	Formal verification — Lean4 translation
6	Peer review simulation — LLM critique
7	LaTeX generation — camera-ready paper

Three Research Levels

Command	Level	Use Case
`prove "conjecture"`	1	Specific theorem to prove
`from-papers <ids>`	2	Extend existing papers
`explore "domain"`	3	Broad research exploration

Gate Modes (Safety Checkpoints)

Mode	Behavior
`none`	Fully autonomous
`auto`	Approvals at critical decisions
`human`	Human-in-the-loop throughout

Scientist-Bench Scoring

Dimension	Weight	Method
Formal correctness	35%	Lean4 / LLM review
Novelty	25%	Embedding distance
Experimental alignment	15%	Numerical validation
Proof depth	15%	Lemma count
Citation coverage	10%	Bibliography

Requirements

Python 3.11+
Anthropic API key (Claude)
macOS / Linux (Windows via WSL)

Quick Start

curl -fsSL https://eurekaclaw.ai/install.sh | bash
eurekaclaw onboard              # Interactive setup
eurekaclaw install-skills       # Install proof skills

eurekaclaw prove "Your conjecture here"
eurekaclaw explore "ML theory"
eurekaclaw from-papers 1706.03762 2005.14165

EurekaClaw vs DeerFlow

Aspect	EurekaClaw	DeerFlow
Domain	Math / theory	General tasks
Output	LaTeX papers	Reports/code
Verification	Lean4	Sandbox
Model	Claude only	Multi-provider
Learning	Skill distillation	Long-term memory

EurekaClaw for Builders

Implementation patterns for extending EurekaClaw and building research workflows.

Configuration Pattern

# ~/.eurekaclaw/config.yaml or .env

# LLM Backend
ANTHROPIC_API_KEY=sk-...           # Required
EUREKACLAW_MODEL=claude-sonnet-4-6  # Main reasoning model

# Pipeline Control
GATE_MODE=auto                     # none | auto | human
THEORY_PIPELINE=default            # default | memory_guided
THEORY_MAX_ITERATIONS=10

# Output Format
OUTPUT_FORMAT=latex                # latex | markdown
EXPERIMENT_MODE=auto               # auto | true | false

Custom Skill Definition

# ~/.eurekaclaw/skills/my-strategy.md
---
name: topological-proof
version: 1.0.0
author: your-name
triggers:
  - "topological"
  - "filtration"
  - "homotopy"
---

# Topological Proof Strategy

**Purpose:** Apply topological methods to continuous optimization problems.

## When to Use

- Problem involves continuous functions
- Can construct a filtration
- Topological invariants apply

## Workflow

1. Construct filtration from problem structure
2. Compute persistence homology
3. Relate topological features to optima
4. Prove bounds via topological properties

## Example Success

"O(n log n) via topological filtration" — sparse attention complexity

Domain Plugin Architecture

# eurekaclaw/domains/my_domain.py
from eurekaclaw.domains.base import DomainPlugin

class MyDomain(DomainPlugin):
    name = "my-domain"
    
    def relevant_papers(self, query: str) -> list:
        """Return arXiv categories or search terms."""
        return ["cs.LG", "cs.AI"]
    
    def validate_hypothesis(self, hypothesis: dict) -> bool:
        """Domain-specific validation logic."""
        return self.check_theoretical_sounds(hypothesis)
    
    def proof_templates(self) -> list:
        """Return common proof structures for this domain."""
        return ["contradiction", "induction", "probabilistic"]

Python API Usage

from eurekaclaw import EurekaSession
from eurekaclaw.memory import KnowledgeBus

# Initialize session
session = EurekaSession(
    model="claude-sonnet-4-6",
    gate_mode="auto",
    output_format="latex"
)

# Run research pipeline
result = session.prove(
    conjecture="Sample complexity is O(L·d·log(d)/ε²)",
    domain="ML theory"
)

# Access results
print(result.proof)
print(result.latex_paper)
print(result.evaluation_score)  # Scientist-Bench

# Memory management
bus = KnowledgeBus()
bus.remember("learned-strategy", result.proof_techniques)

Tool Integration Pattern

from eurekaclaw.tools.base import BaseTool
import requests

class MyCustomTool(BaseTool):
    name = "custom-verifier"
    description = "Verify claims using external service"
    
    def __init__(self, api_key: str):
        self.api_key = api_key
    
    def _run(self, claim: str) -> str:
        """Execute tool logic."""
        response = requests.post(
            "https://api.verifier.com/check",
            json={"claim": claim},
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        return response.json()["result"]
    
    async def _arun(self, claim: str) -> str:
        """Async version for concurrent execution."""
        return await self._async_check(claim)

# Register with agent
from eurekaclaw.agents import ProverAgent

agent = ProverAgent(tools=[MyCustomTool(api_key)])

Session Evaluation

# Evaluate completed research
from eurekaclaw.evaluation import ScientistBench

bench = ScientistBench()
scores = bench.evaluate_session(
    session_id="sess-2026-03-23-001",
    dimensions=["correctness", "novelty", "depth"]
)

print(f"Overall: {scores.weighted_total}")
print(f"Correctness: {scores.correctness}")
print(f"Novelty: {scores.novelty}")
print(f"Citations: {scores.citation_coverage}")

Memory System Tiers

from eurekaclaw.memory import EpisodicMemory, PersistentMemory, KnowledgeGraph

# Episodic: Current session context
episodic = EpisodicMemory()
episodic.add_turn(query="...", response="...")

# Persistent: Cross-session facts
persistent = PersistentMemory()
persistent.remember("user-preference", "topological-methods")

# Knowledge Graph: Relationship structure
kg = KnowledgeGraph()
kg.add_entity("theorem-x", type="theorem")
kg.add_relation("theorem-x", "proves", "lemma-y")

Custom Pipeline Stage

from eurekaclaw.pipeline import PipelineStage

class CustomVerificationStage(PipelineStage):
    """Insert custom verification between existing stages."""
    
    stage_name = "custom-verify"
    depends_on = ["proof-sketch"]
    outputs = ["verified-proof"]
    
    def execute(self, context: dict) -> dict:
        proof = context.get("proof-sketch")
        
        # Custom verification logic
        verification = self.custom_checker(proof)
        
        if not verification.valid:
            raise ProofError(verification.issues)
        
        return {"verified-proof": proof, "checks": verification.checks}

# Register and run
from eurekaclaw.pipeline import ResearchPipeline

pipeline = ResearchPipeline()
pipeline.insert_after("proof-sketch", CustomVerificationStage())
result = pipeline.run(conjecture="...")

Browser UI Extension

// frontend/src/components/CustomPanel.tsx
import React from 'react';
import { useEurekaSession } from '../hooks/useEureka';

export const CustomPanel: React.FC = () => {
  const { session, proofStatus } = useEurekaSession();
  
  return (
    
      Proof Progress
      
        {proofStatus.stages.map(stage => (
          
        ))}
      
      
      
        
      
    
  );
};

Deployment Checklist

[ ] Install: curl -fsSL https://eurekaclaw.ai/install.sh | bash
[ ] Configure: eurekaclaw onboard or manual .env
[ ] Install skills: eurekaclaw install-skills
[ ] Test pipeline: eurekaclaw prove "simple test"
[ ] Build UI: make open
[ ] Add custom skills to ~/.eurekaclaw/skills/
[ ] Test domain plugin (if extending)
[ ] Verify Lean4 installation for formal proofs
[ ] Run evaluation: eurekaclaw eval-session <id>

Technical Appendix
Repository: github.com/EurekaClaw/EurekaClaw
Version analyzed: March 23, 2026 (commit 71b356c)
Stars: 294 | Forks: 25 | Contributors: 2
License: Apache 2.0

Key Files Referenced:
README.md, docs/configuration.md, docs/architecture.md
docs/agents.md, docs/skills.md, docs/memory.md

Website: eurekaclaw.ai
Documentation: eurekaclaw.github.io
Authors: Xuheng Li, Qiwei Di, Chenggong Zhang, Kaixuan Ji, Qingyue Zhao, Yifeng Liu, Shiyuan Zhang, Quanquan Gu (UCLA AGI Lab)