LabNotes
Version: V1 Narrative V2 Scannable V3 Agent/Builder

EurekaClaw: AI Research Agent for Theorem Discovery

EurekaClaw is a multi-agent AI research assistant designed for mathematical and theoretical research. Released by researchers from UCLA's Artificial General Intelligence Lab in March 2026, it transforms natural language questions into publishable research artifacts — crawling literature, generating and proving theorems, and writing LaTeX papers.

The repository has accumulated 294 stars and 25 forks within days of release, with active daily commits from two core contributors. The codebase demonstrates production engineering with comprehensive documentation, a React-based browser UI, and a modular architecture supporting custom skills and domain plugins.

From Question to Publication

The research workflow in EurekaClaw follows a seven-stage pipeline. A user provides a conjecture or research topic. The system then:

  1. Crawls arXiv and Semantic Scholar for relevant literature
  2. Summarizes and cross-references papers for context
  3. Generates novel hypotheses by synthesizing patterns across the literature
  4. Builds proofs through a bottom-up theorem-proving pipeline
  5. Verifies correctness via Lean4 formalization or LLM peer review
  6. Runs numerical experiments to validate theoretical bounds (optional)
  7. Produces camera-ready LaTeX with theorem environments and citations

This pipeline operates autonomously once initiated, with optional human-in-the-loop checkpoints controlled via the GATE_MODE environment variable.

Multi-Agent Architecture

EurekaClaw implements a multi-agent system where specialized agents handle distinct stages:

AgentFunctionTools Used
Literature CrawlerFetch and summarize papersarXiv API, Semantic Scholar
Idea GeneratorSynthesize hypotheses from patternsEmbedding models, LLM reasoning
Theorem ProverGenerate and verify proofsLean4, symbolic verification
Paper WriterDraft LaTeX manuscriptsLaTeX compiler, citation management
Experiment RunnerValidate bounds numericallyWolframAlpha, Python execution

Agents communicate through a KnowledgeBus that persists context across the pipeline. Each agent can access the outputs of previous stages while maintaining isolation for their specific tasks.

The 7-Stage Theorem Pipeline

The core innovation in EurekaClaw is its structured approach to automated theorem proving:

Stage 1: Literature Synthesis — Agents read and summarize relevant papers, extracting key lemmas, definitions, and proof techniques.

Stage 2: Hypothesis Generation — Pattern matching across the literature identifies gaps and suggests novel conjectures.

Stage 3: Lemma Decomposition — Complex theorems are broken into sub-lemmas with explicit dependencies.

Stage 4: Proof Sketching — High-level proof strategies are selected based on the literature and problem structure.

Stage 5: Formal Verification — Proofs are translated to Lean4 for machine-checked correctness.

Stage 6: Peer Review Simulation — LLM-based reviewers critique proofs for soundness and clarity.

Stage 7: LaTeX Generation — Camera-ready papers with proper theorem environments, citations, and bibliography.

Continual Learning via Skills

EurekaClaw implements a skill system that improves performance over time. After each research session, successful proof strategies are distilled into reusable skills stored in ~/.eurekaclaw/skills/. These skills are Markdown files containing:

  • Trigger conditions (when to apply this strategy)
  • Workflow descriptions (step-by-step process)
  • Parameters and configuration options
  • Examples of successful applications

The THEORY_PIPELINE configuration can switch between "default" and "memory_guided" modes. In memory_guided mode, previously learned skills influence hypothesis generation and proof strategy selection.

Three Levels of Research Depth

EurekaClaw provides three entry points depending on research maturity:

CommandLevelUse Case
eurekaclaw prove "conjecture"1Precise mathematical statement to prove
eurekaclaw from-papers <ids>2Extend or find gaps in specific papers
eurekaclaw explore "domain"3Broad research area without specific conjecture

Level 1 assumes the user knows what they want to prove. Level 2 starts from existing work. Level 3 is exploratory, suitable for early-stage research ideation.

Scientist-Bench Evaluation

EurekaClaw includes a built-in evaluation framework called Scientist-Bench that scores research outputs across five dimensions:

DimensionWeightMethod
Formal correctness0.35Lean4 / LLM peer review
Novelty0.25Embedding distance from known results
Experimental alignment0.15Numerical validation
Proof depth0.15Lemma count and complexity
Citation coverage0.10Bibliography completeness

This evaluation runs automatically at session completion via eurekaclaw eval-session <session_id>, providing quantitative feedback on research quality.

Local-First Architecture

EurekaClaw emphasizes privacy and local execution. The system runs entirely on the user's machine with no cloud dependencies beyond LLM API calls. Data persists to local directories:

~/.eurekaclaw/
├── skills/           # Learned strategies
├── memory/           # Episodic and persistent storage
├── results/          # Generated papers and proofs
└── config.yaml       # User preferences

The browser UI (React + TypeScript) provides visualization without compromising the local-first principle — it connects to the local backend rather than a remote service.

Configuration and Gate Modes

EurekaClaw implements configurable safety checkpoints via GATE_MODE:

  • none: Fully autonomous execution
  • auto: System requests human approval at critical decisions
  • human: Human-in-the-loop for all major stages

This addresses the fundamental tension in AI research tools: full autonomy risks error propagation, while excessive oversight negates the productivity benefits. EurekaClaw lets users calibrate this trade-off.

Tool Ecosystem

EurekaClaw integrates with multiple external tools for verification:

  • arXiv API: Paper retrieval and metadata
  • Semantic Scholar: Citation networks and paper relationships
  • Lean4: Formal proof verification
  • WolframAlpha: Symbolic computation
  • Python execution: Numerical experiments and plotting

Each tool is implemented as a modular component that can be extended or replaced. Custom tools subclass BaseTool and register with the agent system.

Comparison with DeerFlow

Both EurekaClaw and DeerFlow pursue multi-agent research automation but with different emphases:

AspectEurekaClawDeerFlow
Primary domainMathematical / theoretical researchGeneral agent tasks
Output formatLaTeX papers with proofsReports, slides, code
VerificationLean4 formalizationSandboxed execution
Entry points3 levels (prove/explore/from-papers)Skills-based
LearningSkill distillation after sessionsLong-term memory
EvaluationScientist-Bench (5 dimensions)User judgment

EurekaClaw targets the formal research workflow with specific emphasis on correctness verification. DeerFlow targets general agent execution with broader tooling.

Technical Assessment

Strengths:

  • Structured theorem-proving pipeline with formal verification
  • Continual learning through skill distillation
  • Built-in evaluation framework (Scientist-Bench)
  • Local-first privacy architecture
  • Three entry points for different research maturity levels
  • Comprehensive documentation and browser UI

Considerations:

  • 294 stars vs DeerFlow's 38,000 — newer, less battle-tested
  • Only 2 contributors vs DeerFlow's 147
  • Requires Anthropic API (Claude) — no model flexibility yet
  • Experiment Runner marked "under development"
  • Windows support incomplete (WSL recommended)

Conclusion

EurekaClaw represents a focused approach to AI-assisted mathematical research. The 7-stage theorem pipeline, Lean4 verification integration, and Scientist-Bench evaluation demonstrate serious attention to research rigor. The skill-based continual learning system addresses the genuine problem of accumulating domain expertise across sessions.

Use EurekaClaw when you need formal theorem generation with verification, LaTeX paper production, and structured research workflows. Consider DeerFlow when you need general agent capabilities with broader tooling and multi-channel integration.

The tool is Apache 2.0 licensed and actively maintained. For researchers in theoretical computer science, machine learning theory, and applied mathematics, it provides a concrete foundation for AI-assisted research workflows.

Quick Facts

MetricValue
GitHub Stars294
Forks25
Contributors2 (UCLA AGI Lab)
LicenseApache 2.0
Primary LanguagePython 54.8%
Release DateMarch 2026
Repositorygithub.com/EurekaClaw/EurekaClaw
Websiteeurekaclaw.ai

Core Features

FeatureDescription
Literature CrawlerFetch/summarize arXiv and Semantic Scholar papers
Idea GeneratorSynthesize hypotheses from literature patterns
Theorem Prover7-stage bottom-up proof pipeline
Paper WriterLaTeX generation with theorem environments
Experiment RunnerNumerical validation (in development)
Continual LearningSkill distillation after each session
Browser UIReact + TypeScript live tracking interface

7-Stage Pipeline

StageFunction
1Literature synthesis — read and summarize papers
2Hypothesis generation — identify gaps
3Lemma decomposition — break into sub-lemmas
4Proof sketching — high-level strategy
5Formal verification — Lean4 translation
6Peer review simulation — LLM critique
7LaTeX generation — camera-ready paper

Three Research Levels

CommandLevelUse Case
prove "conjecture"1Specific theorem to prove
from-papers <ids>2Extend existing papers
explore "domain"3Broad research exploration

Gate Modes (Safety Checkpoints)

ModeBehavior
noneFully autonomous
autoApprovals at critical decisions
humanHuman-in-the-loop throughout

Scientist-Bench Scoring

DimensionWeightMethod
Formal correctness35%Lean4 / LLM review
Novelty25%Embedding distance
Experimental alignment15%Numerical validation
Proof depth15%Lemma count
Citation coverage10%Bibliography

Requirements

  • Python 3.11+
  • Anthropic API key (Claude)
  • macOS / Linux (Windows via WSL)

Quick Start

curl -fsSL https://eurekaclaw.ai/install.sh | bash
eurekaclaw onboard              # Interactive setup
eurekaclaw install-skills       # Install proof skills

eurekaclaw prove "Your conjecture here"
eurekaclaw explore "ML theory"
eurekaclaw from-papers 1706.03762 2005.14165

EurekaClaw vs DeerFlow

AspectEurekaClawDeerFlow
DomainMath / theoryGeneral tasks
OutputLaTeX papersReports/code
VerificationLean4Sandbox
ModelClaude onlyMulti-provider
LearningSkill distillationLong-term memory

EurekaClaw for Builders

Implementation patterns for extending EurekaClaw and building research workflows.

Configuration Pattern

# ~/.eurekaclaw/config.yaml or .env

# LLM Backend
ANTHROPIC_API_KEY=sk-...           # Required
EUREKACLAW_MODEL=claude-sonnet-4-6  # Main reasoning model

# Pipeline Control
GATE_MODE=auto                     # none | auto | human
THEORY_PIPELINE=default            # default | memory_guided
THEORY_MAX_ITERATIONS=10

# Output Format
OUTPUT_FORMAT=latex                # latex | markdown
EXPERIMENT_MODE=auto               # auto | true | false

Custom Skill Definition

# ~/.eurekaclaw/skills/my-strategy.md
---
name: topological-proof
version: 1.0.0
author: your-name
triggers:
  - "topological"
  - "filtration"
  - "homotopy"
---

# Topological Proof Strategy

**Purpose:** Apply topological methods to continuous optimization problems.

## When to Use

- Problem involves continuous functions
- Can construct a filtration
- Topological invariants apply

## Workflow

1. Construct filtration from problem structure
2. Compute persistence homology
3. Relate topological features to optima
4. Prove bounds via topological properties

## Example Success

"O(n log n) via topological filtration" — sparse attention complexity

Domain Plugin Architecture

# eurekaclaw/domains/my_domain.py
from eurekaclaw.domains.base import DomainPlugin

class MyDomain(DomainPlugin):
    name = "my-domain"
    
    def relevant_papers(self, query: str) -> list:
        """Return arXiv categories or search terms."""
        return ["cs.LG", "cs.AI"]
    
    def validate_hypothesis(self, hypothesis: dict) -> bool:
        """Domain-specific validation logic."""
        return self.check_theoretical_sounds(hypothesis)
    
    def proof_templates(self) -> list:
        """Return common proof structures for this domain."""
        return ["contradiction", "induction", "probabilistic"]

Python API Usage

from eurekaclaw import EurekaSession
from eurekaclaw.memory import KnowledgeBus

# Initialize session
session = EurekaSession(
    model="claude-sonnet-4-6",
    gate_mode="auto",
    output_format="latex"
)

# Run research pipeline
result = session.prove(
    conjecture="Sample complexity is O(L·d·log(d)/ε²)",
    domain="ML theory"
)

# Access results
print(result.proof)
print(result.latex_paper)
print(result.evaluation_score)  # Scientist-Bench

# Memory management
bus = KnowledgeBus()
bus.remember("learned-strategy", result.proof_techniques)

Tool Integration Pattern

from eurekaclaw.tools.base import BaseTool
import requests

class MyCustomTool(BaseTool):
    name = "custom-verifier"
    description = "Verify claims using external service"
    
    def __init__(self, api_key: str):
        self.api_key = api_key
    
    def _run(self, claim: str) -> str:
        """Execute tool logic."""
        response = requests.post(
            "https://api.verifier.com/check",
            json={"claim": claim},
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        return response.json()["result"]
    
    async def _arun(self, claim: str) -> str:
        """Async version for concurrent execution."""
        return await self._async_check(claim)

# Register with agent
from eurekaclaw.agents import ProverAgent

agent = ProverAgent(tools=[MyCustomTool(api_key)])

Session Evaluation

# Evaluate completed research
from eurekaclaw.evaluation import ScientistBench

bench = ScientistBench()
scores = bench.evaluate_session(
    session_id="sess-2026-03-23-001",
    dimensions=["correctness", "novelty", "depth"]
)

print(f"Overall: {scores.weighted_total}")
print(f"Correctness: {scores.correctness}")
print(f"Novelty: {scores.novelty}")
print(f"Citations: {scores.citation_coverage}")

Memory System Tiers

from eurekaclaw.memory import EpisodicMemory, PersistentMemory, KnowledgeGraph

# Episodic: Current session context
episodic = EpisodicMemory()
episodic.add_turn(query="...", response="...")

# Persistent: Cross-session facts
persistent = PersistentMemory()
persistent.remember("user-preference", "topological-methods")

# Knowledge Graph: Relationship structure
kg = KnowledgeGraph()
kg.add_entity("theorem-x", type="theorem")
kg.add_relation("theorem-x", "proves", "lemma-y")

Custom Pipeline Stage

from eurekaclaw.pipeline import PipelineStage

class CustomVerificationStage(PipelineStage):
    """Insert custom verification between existing stages."""
    
    stage_name = "custom-verify"
    depends_on = ["proof-sketch"]
    outputs = ["verified-proof"]
    
    def execute(self, context: dict) -> dict:
        proof = context.get("proof-sketch")
        
        # Custom verification logic
        verification = self.custom_checker(proof)
        
        if not verification.valid:
            raise ProofError(verification.issues)
        
        return {"verified-proof": proof, "checks": verification.checks}

# Register and run
from eurekaclaw.pipeline import ResearchPipeline

pipeline = ResearchPipeline()
pipeline.insert_after("proof-sketch", CustomVerificationStage())
result = pipeline.run(conjecture="...")

Browser UI Extension

// frontend/src/components/CustomPanel.tsx
import React from 'react';
import { useEurekaSession } from '../hooks/useEureka';

export const CustomPanel: React.FC = () => {
  const { session, proofStatus } = useEurekaSession();
  
  return (
    

Proof Progress

{proofStatus.stages.map(stage => ( ))}
); };

Deployment Checklist

  • [ ] Install: curl -fsSL https://eurekaclaw.ai/install.sh | bash
  • [ ] Configure: eurekaclaw onboard or manual .env
  • [ ] Install skills: eurekaclaw install-skills
  • [ ] Test pipeline: eurekaclaw prove "simple test"
  • [ ] Build UI: make open
  • [ ] Add custom skills to ~/.eurekaclaw/skills/
  • [ ] Test domain plugin (if extending)
  • [ ] Verify Lean4 installation for formal proofs
  • [ ] Run evaluation: eurekaclaw eval-session <id>

Technical Appendix
Repository: github.com/EurekaClaw/EurekaClaw
Version analyzed: March 23, 2026 (commit 71b356c)
Stars: 294 | Forks: 25 | Contributors: 2
License: Apache 2.0

Key Files Referenced:
README.md, docs/configuration.md, docs/architecture.md
docs/agents.md, docs/skills.md, docs/memory.md

Website: eurekaclaw.ai
Documentation: eurekaclaw.github.io
Authors: Xuheng Li, Qiwei Di, Chenggong Zhang, Kaixuan Ji, Qingyue Zhao, Yifeng Liu, Shiyuan Zhang, Quanquan Gu (UCLA AGI Lab)