Problem: Synchronous Crew Execution Blocks Your Application
By default, crew.kickoff() is synchronous — it blocks the calling thread until every agent finishes. That's fine for a CLI script. It's a problem the moment you embed CrewAI inside a FastAPI endpoint, a background worker, or a pipeline that needs to fan out across multiple inputs simultaneously.
You'll learn:
- How
kickoff_asyncandkickoff_for_each_asyncwork under the hood - How to run multiple crews concurrently with
asyncio.gather - How to integrate async crews into a FastAPI endpoint without blocking the event loop
- The three most common mistakes that silently serialize your "parallel" agents
Time: 20 min | Difficulty: Intermediate
Why Synchronous Kickoff Hurts at Scale
Each CrewAI agent round-trips to an LLM API. A crew with 3 agents and 2 tasks each makes roughly 6 LLM calls in sequence. At ~2s per call, that's ~12s of blocking I/O.
If you're serving 10 concurrent API requests, the 10th user waits 120s — not because your agents are slow, but because you're not using async I/O.
Symptoms:
- FastAPI endpoint response times spike under load
asyncio.run()raisesRuntimeError: This event loop is already running- CPU sits idle while crews wait on LLM responses
- Spawning threads with
ThreadPoolExecutorto work around blocking calls
Solution
Step 1: Install the Right Version
kickoff_async was stabilized in CrewAI 0.70.0. Confirm you're on a compatible version.
# Install or upgrade crewai
pip install "crewai>=0.70.0"
# Verify
python -c "import crewai; print(crewai.__version__)"
Expected output:
0.80.0
Step 2: Replace kickoff With kickoff_async
kickoff_async is a coroutine — call it with await inside an async function.
import asyncio
from crewai import Agent, Crew, Task, Process
from crewai_tools import SerperDevTool
# Define a minimal crew for the examples below
def build_research_crew(topic: str) -> Crew:
researcher = Agent(
role="Research Analyst",
goal=f"Find the latest developments on {topic}",
backstory="You specialize in concise technical research.",
tools=[SerperDevTool()],
verbose=False,
)
task = Task(
description=f"Research {topic} and summarize findings in 3 bullet points.",
expected_output="3 bullet points, each under 20 words.",
agent=researcher,
)
return Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
verbose=False,
)
async def run_single_async():
crew = build_research_crew("LangGraph 0.3 release")
# kickoff_async returns a CrewOutput coroutine — await it
result = await crew.kickoff_async()
print(result.raw)
asyncio.run(run_single_async())
Expected output:
- LangGraph 0.3 introduces native streaming support for agent state updates
- New checkpoint backend reduces memory usage by ~40% in long-running graphs
- Breaking change: StateGraph.compile() now requires explicit interrupt config
The crew runs entirely within the existing event loop. No thread blocking.
Step 3: Run Multiple Crews Concurrently With asyncio.gather
This is where async pays off. asyncio.gather runs all coroutines concurrently — LLM I/O from one crew overlaps with I/O from another.
async def run_parallel_crews():
topics = [
"CrewAI 0.80 changelog",
"LangGraph vs CrewAI 2026",
"AutoGen 0.5 new features",
]
# Build all crews upfront — crew creation is synchronous and cheap
crews = [build_research_crew(topic) for topic in topics]
# Fire all three kickoffs concurrently
# asyncio.gather preserves order: results[0] matches topics[0]
results = await asyncio.gather(
*[crew.kickoff_async() for crew in crews]
)
for topic, result in zip(topics, results):
print(f"\n=== {topic} ===")
print(result.raw)
asyncio.run(run_parallel_crews())
All three crews make their LLM calls in parallel. Wall-clock time drops from ~3× single-crew time to ~1.2× (overhead only).
Step 4: Use kickoff_for_each_async for Input Arrays
When you have one crew design but N different inputs, kickoff_for_each_async handles fan-out without manual gather.
async def run_for_each():
crew = build_research_crew("{topic}") # placeholder — inputs override at runtime
inputs_list = [
{"topic": "Ollama 0.6 release"},
{"topic": "vLLM v0.7 performance"},
{"topic": "llama.cpp GGUF v4"},
]
# Runs one crew instance per input dict, concurrently
results = await crew.kickoff_for_each_async(inputs=inputs_list)
for inputs, result in zip(inputs_list, results):
print(f"\n--- {inputs['topic']} ---")
print(result.raw)
asyncio.run(run_for_each())
kickoff_for_each_async is cleaner than manual gather when all inputs share the same crew blueprint. It also handles exceptions per-input rather than failing the whole batch.
Step 5: Integrate Into a FastAPI Endpoint
Never call asyncio.run() inside a FastAPI route — FastAPI already runs an event loop. Use await directly.
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class ResearchRequest(BaseModel):
topics: list[str]
class ResearchResponse(BaseModel):
results: dict[str, str]
@app.post("/research", response_model=ResearchResponse)
async def research_endpoint(request: ResearchRequest):
# Build crews for each topic
crews = [build_research_crew(topic) for topic in request.topics]
# Await all concurrently — event loop stays unblocked for other requests
raw_results = await asyncio.gather(
*[crew.kickoff_async() for crew in crews]
)
return ResearchResponse(
results={
topic: result.raw
for topic, result in zip(request.topics, raw_results)
}
)
# Test it
curl -X POST http://localhost:8000/research \
-H "Content-Type: application/json" \
-d '{"topics": ["CrewAI async", "LangGraph streaming"]}'
Expected: JSON response with both topics researched, both LLM calls made in parallel.
Verification
Run this benchmark to confirm parallel execution is actually faster than sequential.
import asyncio
import time
from crewai import Agent, Crew, Task, Process
def build_timing_crew(label: str) -> Crew:
agent = Agent(
role="Timer",
goal=f"Say '{label} done' and nothing else.",
backstory="You respond exactly as instructed.",
verbose=False,
)
task = Task(
description=f"Reply with exactly: {label} done",
expected_output=f"{label} done",
agent=agent,
)
return Crew(agents=[agent], tasks=[task], process=Process.sequential, verbose=False)
async def benchmark():
crews = [build_timing_crew(f"crew-{i}") for i in range(4)]
start = time.perf_counter()
await asyncio.gather(*[c.kickoff_async() for c in crews])
parallel_time = time.perf_counter() - start
crews2 = [build_timing_crew(f"crew-{i}") for i in range(4)]
start = time.perf_counter()
for c in crews2:
await c.kickoff_async() # sequential awaits
sequential_time = time.perf_counter() - start
print(f"Parallel: {parallel_time:.1f}s")
print(f"Sequential: {sequential_time:.1f}s")
print(f"Speedup: {sequential_time / parallel_time:.1f}x")
asyncio.run(benchmark())
You should see: Parallel time roughly equal to the slowest single crew, not the sum of all four.
Three Pitfalls That Silently Serialize Your Crews
Pitfall 1: Sharing a single Agent instance across crews
CrewAI agents maintain internal state between tasks. If you reuse the same Agent object in multiple concurrent crews, they overwrite each other's context.
# ❌ Wrong — shared agent, race condition on internal state
shared_agent = Agent(role="Researcher", ...)
crew_a = Crew(agents=[shared_agent], ...)
crew_b = Crew(agents=[shared_agent], ...) # will corrupt crew_a's state
# ✅ Correct — fresh agent per crew
crew_a = Crew(agents=[Agent(role="Researcher", ...)], ...)
crew_b = Crew(agents=[Agent(role="Researcher", ...)], ...)
Pitfall 2: Calling asyncio.run() inside an existing event loop
# ❌ Wrong — raises RuntimeError in FastAPI or Jupyter
result = asyncio.run(crew.kickoff_async())
# ✅ Correct — inside an async context, just await
result = await crew.kickoff_async()
If you're in Jupyter, use await directly or nest_asyncio.apply() at the top of the notebook.
Pitfall 3: Using kickoff_for_each (sync) instead of kickoff_for_each_async
# ❌ Wrong — synchronous, blocks for N × crew_time
results = crew.kickoff_for_each(inputs=inputs_list)
# ✅ Correct — concurrent, wall time ≈ max single crew time
results = await crew.kickoff_for_each_async(inputs=inputs_list)
The synchronous version exists for simple scripts. It has no place in a web server or concurrent pipeline.
What You Learned
kickoff_asyncis a drop-in async replacement forkickoff— always use it inside async contextsasyncio.gatherruns crews concurrently; LLM I/O from each crew overlapskickoff_for_each_asyncis the idiomatic way to fan out one crew blueprint across N inputs- Never share Agent instances across concurrent crews — create fresh agents per crew
- In FastAPI, use
awaitdirectly; never callasyncio.run()inside a route
When NOT to use async kickoff: Single-crew CLI scripts where simplicity matters more than performance. crew.kickoff() is fine there. Switch to async the moment you embed CrewAI into a server or need parallel execution.
Tested on CrewAI 0.80.0, Python 3.12, FastAPI 0.115, Ubuntu 24.04 and macOS Sequoia