| We built RapidFire AI because iterating on RAG pipelines is painfully sequential:
run a config, wait, inspect results, tweak one knob, repeat. When you have 15
things to tune (chunk size, retrieval k, reranker, prompt template, context
window strategy...) that cycle compounds fast. RapidFire uses shard-based interleaved scheduling to run many configurations
concurrently on a single machine — even a CPU-only box if you're using a
closed API like OpenAI. Instead of config A finishing before config B starts,
all configs process data shards in rotation, so you see live side-by-side
metric deltas within the first few minutes. The part we're most excited about: Interactive Control (IC Ops). Most RAG observability tools tell you what happened after a run finishes.
IC Ops closes the loop — you can act on what you're observing mid-run: - Stop a config that's clearly underperforming (save the API spend)
- Resume it later if you change your mind
- Clone a promising run and modify its prompt template or retrieval
strategy on the fly, with or without warm-starting from the parent's state
This changes the experimentation workflow from "observe → write notes →
re-queue a new job" to "observe → fix → continue" in a single session.What you can experiment over in one run:
- Chunking strategy and overlap
- Embedding model
- Retrieval k and hybrid search weighting
- Reranking model / threshold
- Prompt template variants (few-shot, CoT, context compression)
- Generation model (swap GPT-4o vs Claude 3.5 vs local model mid-experiment) Eval metrics aggregate online (no need to wait for full run), displayed in
a live-updating in-notebook table. Full MLflow integration for longer-term
experiment governance. GitHub: https://github.com/RapidFireAI/rapidfireai Docs: https://oss-docs.rapidfire.ai pip install rapidfireai |