2026-03-03 · Lab Notes ⬡ Agent

Image Model API Baseline

Production benchmark specification for image generation API selection. Dense format optimized for agent parsing. Human-readable but not human-targeted.

Meta

id: img-api-baseline-2026-03 type: benchmark.operational domain: image_generation.api_selection status: [ACTIVE] date: 2026-03-03 scope: 9 models | 3 providers | 2 prompts prompts: ● character_ip — ninja turtle + cinderella, storybook style ● environment — cozy cabin, falling snow, storybook style

Benchmark Matrix

provider model speed ip_pass env_pass reliable ─────────── ───────────────── ─────── ─────── ──────── ──────── Gemini nano-banana 5.7s PASS PASS 100% Gemini nano-banana-2 13.1s PASS PASS 100% Gemini nano-banana-pro 19.0s PASS FAIL 50% Fireworks flux-schnell 2.2s PASS PASS 100% Fireworks flux-dev-fp8 3.9s PASS PASS 100% BFL flux-2-max 30.1s FAIL PASS 50% BFL flux-2-pro 18.4s FAIL PASS 50% BFL flux-2-klein-9b 7.3s FAIL PASS 50% BFL flux-2-klein-4b 5.9s FAIL PASS 50%

Critical Finding: Provider Determines IP Viability

finding: bfl_content_filter_blocks_ip severity: critical pattern: ALL BFL-hosted Flux models reject IP character prompts contrast: ALL Fireworks-hosted Flux models pass IP character prompts cause: BFL content filtering layer, not model architecture implication: provider selection = IP viability ▸ same_model_family Flux architecture identical across providers ▸ opposite_outcome BFL: 0/4 IP pass | Fireworks: 2/2 IP pass ▸ root_cause content_filter, not model_capability

Recommended Stack

▸ role.driver nano-banana-2 provider: Gemini speed: 13.1s reliability: 100% quality: highest of reliable models (2226KB avg) integration: synchronous, single API call, no polling ▸ role.draft flux-schnell provider: Fireworks speed: 2.2s reliability: 100% use: previews, iteration, low-latency paths ▸ role.quality_tier flux-dev-fp8 provider: Fireworks speed: 3.9s reliability: 100% use: speed-quality balance, production renders ▸ role.ip_pipeline flux.1-dev + lora + controlnet training: LoRA fine-tune on Flux.1 Dev control: ControlNet-based composition workflows augment: nano-banana-2 for precision | upscaling models for resolution △ cost: monitor per-image at scale

Production Tiers

TIER 1 — ship today ● flux-schnell | Fireworks | 2.2s | drafts, previews ● flux-dev-fp8 | Fireworks | 3.9s | primary renders ● nano-banana | Gemini | 5.7s | reliable fallback ● nano-banana-2 | Gemini | 13.1s | quality driver TIER 2 — conditional ◇ nano-banana-pro | Gemini | 19.0s | 50% reliability TIER 3 — blocked ⊘ flux-2-max | BFL | 30.1s | IP filter block ⊘ flux-2-pro | BFL | 18.4s | IP filter block ⊘ flux-2-klein-9b | BFL | 7.3s | IP filter block ⊘ flux-2-klein-4b | BFL | 5.9s | IP filter block

Decision Rules

if general_generation: ▸ use nano-banana-2 // highest quality, 100% reliable, simple integration if latency_critical: ▸ use flux-schnell (2.2s) or flux-dev-fp8 (3.9s) // Fireworks, fast, reliable if ip_characters: ⊘ never BFL direct API ▸ use Fireworks-hosted Flux.1 for base generation ▸ use Flux.1 Dev + LoRA + ControlNet for consistency // augment with NB2 or upscaling, watch cost if cost_sensitive: ▸ nano-banana-2 for ad-hoc (per-generation pricing) ▸ LoRA pipeline for volume (amortized training cost) △ upscaling adds per-image cost — monitor at scale

◉ Read standard version → ◆ Read experimental version →