Image model API baseline: March 2026
Nine models, three providers, two prompts. A production baseline for image generation APIs — speed, reliability, content filtering, and IP handling.
Test Setup
Each model received two prompts in storybook illustration style: a character-interaction scene (Michaelangelo the Ninja Turtle next to Cinderella) and an environment scene (cozy cabin, falling snow). The character prompt tests IP handling. The environment prompt tests baseline quality.
The Models
| Provider | Model | Avg Speed | Reliability |
|---|---|---|---|
| Gemini | nano-banana | 5.7s | 100% |
| Gemini | nano-banana-2 | 13.1s | 100% |
| Gemini | nano-banana-pro | 19.0s | 50% |
| Fireworks | flux-schnell | 2.2s | 100% |
| Fireworks | flux-dev-fp8 | 3.9s | 100% |
| BFL | flux-2-max | 30.1s | 50% |
| BFL | flux-2-pro | 18.4s | 50% |
| BFL | flux-2-klein-9b | 7.3s | 50% |
| BFL | flux-2-klein-4b | 5.9s | 50% |
The BFL Content Filter Problem
Every BFL-hosted Flux model failed the character IP prompt. All four — max, pro, klein-9b, klein-4b — rejected the Ninja Turtle + Cinderella scene. All four passed the cabin scene. The pattern is consistent: BFL's content filtering blocks prompts involving recognizable IP characters.
The same Flux model family hosted on Fireworks (flux-schnell, flux-dev-fp8) passed both prompts at 100% reliability. Same architecture, different provider, opposite outcome. Provider choice determines whether you can generate IP-adjacent content through the API.
Flux.1 Models and IP
Fireworks-hosted Flux.1 models handle character IP well. Flux-schnell generates at 2.2 seconds average, flux-dev-fp8 at 3.9 seconds. Both passed every prompt. The Flux architecture produces clean character interactions and maintains prompt adherence across styles.
For production IP work that requires character consistency across multiple images, the recommended path is LoRA training on Flux.1 Dev with ControlNet-based workflows. This gives fine-grained control over character features, pose, and style consistency that base model prompting alone cannot guarantee. ControlNet pipelines add composition control without sacrificing the quality of the base model.
Nano Banana 2: Recommended Driver
Google's Nano Banana family runs natively through the Gemini API. Three tiers tested: nano-banana (base), nano-banana-2 (enhanced), nano-banana-pro (highest tier).
Nano Banana 2 is the recommended general-purpose driver. Three reasons:
01 Reliable
100% pass rate on both prompts, including the character IP scene that blocked every BFL model.
02 High output quality
2226KB average file size — the highest of any reliable model. Visibly richer detail than flux-schnell or nano-banana base.
03 Simple integration
Single synchronous Gemini API call. No polling, no async status checks, no download step. The simplest integration path of any model tested.
The trade-off is speed: 13.1 seconds average. For latency-sensitive use cases (preview generation, draft iterations), pair it with flux-schnell at 2.2 seconds for drafts and Nano Banana 2 for finals.
Nano Banana Pro (highest tier) failed one of two prompts and timed out at 70 seconds on the failure. The Pro tier is unstable in production today.
Production Stack Recommendation
| Use Case | Model | Speed | Notes |
|---|---|---|---|
| General driver | nano-banana-2 | 13.1s | Best quality-reliability balance |
| Drafts / previews | flux-schnell | 2.2s | Fast iteration, lower detail |
| Quality tier | flux-dev-fp8 | 3.9s | Good speed-quality ratio |
| IP characters | Flux.1 Dev + LoRA | varies | ControlNet workflows for consistency |
Cost Considerations
Nano Banana 2 and the Flux.1 Dev + LoRA path both produce high-quality output, but costs scale differently. Nano Banana 2 charges per generation through Gemini API pricing. LoRA fine-tuning on Flux.1 Dev has upfront training cost plus per-inference cost through Fireworks or self-hosted infrastructure. For high-volume IP work (hundreds of consistent character images), the LoRA path amortizes well. For ad-hoc generation, Nano Banana 2 is more cost-effective. Augmenting either path with upscaling models is viable but adds per-image cost — monitor it at scale.
Key Findings
- BFL content filters block IP-adjacent character prompts across all Flux 2 models
- Fireworks-hosted Flux.1 handles the same prompts at 100% reliability
- Nano Banana 2 is the most reliable high-quality model (13.1s, 100%, 2226KB avg)
- For character consistency: LoRA training on Flux.1 Dev with ControlNet workflows
- Nano Banana Pro and all BFL-direct models are unstable for production use
- Provider matters as much as model — same architecture, different reliability
Full benchmark with interactive scoring and generated images: image model benchmark tool.