LabNotes
2026-03-03 · Lab Notes · 7 min ◉ Standard

Image model API baseline: March 2026

Nine models, three providers, two prompts. A production baseline for image generation APIs — speed, reliability, content filtering, and IP handling.

Test Setup

Each model received two prompts in storybook illustration style: a character-interaction scene (Michaelangelo the Ninja Turtle next to Cinderella) and an environment scene (cozy cabin, falling snow). The character prompt tests IP handling. The environment prompt tests baseline quality.

The Models

ProviderModelAvg SpeedReliability
Gemininano-banana5.7s100%
Gemininano-banana-213.1s100%
Gemininano-banana-pro19.0s50%
Fireworksflux-schnell2.2s100%
Fireworksflux-dev-fp83.9s100%
BFLflux-2-max30.1s50%
BFLflux-2-pro18.4s50%
BFLflux-2-klein-9b7.3s50%
BFLflux-2-klein-4b5.9s50%

The BFL Content Filter Problem

Every BFL-hosted Flux model failed the character IP prompt. All four — max, pro, klein-9b, klein-4b — rejected the Ninja Turtle + Cinderella scene. All four passed the cabin scene. The pattern is consistent: BFL's content filtering blocks prompts involving recognizable IP characters.

The same Flux model family hosted on Fireworks (flux-schnell, flux-dev-fp8) passed both prompts at 100% reliability. Same architecture, different provider, opposite outcome. Provider choice determines whether you can generate IP-adjacent content through the API.

Flux.1 Models and IP

Fireworks-hosted Flux.1 models handle character IP well. Flux-schnell generates at 2.2 seconds average, flux-dev-fp8 at 3.9 seconds. Both passed every prompt. The Flux architecture produces clean character interactions and maintains prompt adherence across styles.

For production IP work that requires character consistency across multiple images, the recommended path is LoRA training on Flux.1 Dev with ControlNet-based workflows. This gives fine-grained control over character features, pose, and style consistency that base model prompting alone cannot guarantee. ControlNet pipelines add composition control without sacrificing the quality of the base model.

Nano Banana 2: Recommended Driver

Google's Nano Banana family runs natively through the Gemini API. Three tiers tested: nano-banana (base), nano-banana-2 (enhanced), nano-banana-pro (highest tier).

Nano Banana 2 is the recommended general-purpose driver. Three reasons:

01 Reliable

100% pass rate on both prompts, including the character IP scene that blocked every BFL model.

02 High output quality

2226KB average file size — the highest of any reliable model. Visibly richer detail than flux-schnell or nano-banana base.

03 Simple integration

Single synchronous Gemini API call. No polling, no async status checks, no download step. The simplest integration path of any model tested.

The trade-off is speed: 13.1 seconds average. For latency-sensitive use cases (preview generation, draft iterations), pair it with flux-schnell at 2.2 seconds for drafts and Nano Banana 2 for finals.

Nano Banana Pro (highest tier) failed one of two prompts and timed out at 70 seconds on the failure. The Pro tier is unstable in production today.

Production Stack Recommendation

Use CaseModelSpeedNotes
General drivernano-banana-213.1sBest quality-reliability balance
Drafts / previewsflux-schnell2.2sFast iteration, lower detail
Quality tierflux-dev-fp83.9sGood speed-quality ratio
IP charactersFlux.1 Dev + LoRAvariesControlNet workflows for consistency

Cost Considerations

Nano Banana 2 and the Flux.1 Dev + LoRA path both produce high-quality output, but costs scale differently. Nano Banana 2 charges per generation through Gemini API pricing. LoRA fine-tuning on Flux.1 Dev has upfront training cost plus per-inference cost through Fireworks or self-hosted infrastructure. For high-volume IP work (hundreds of consistent character images), the LoRA path amortizes well. For ad-hoc generation, Nano Banana 2 is more cost-effective. Augmenting either path with upscaling models is viable but adds per-image cost — monitor it at scale.

Key Findings

  • BFL content filters block IP-adjacent character prompts across all Flux 2 models
  • Fireworks-hosted Flux.1 handles the same prompts at 100% reliability
  • Nano Banana 2 is the most reliable high-quality model (13.1s, 100%, 2226KB avg)
  • For character consistency: LoRA training on Flux.1 Dev with ControlNet workflows
  • Nano Banana Pro and all BFL-direct models are unstable for production use
  • Provider matters as much as model — same architecture, different reliability

Full benchmark with interactive scoring and generated images: image model benchmark tool.