Problem: Synchronous Crew Execution Blocks Your Application

By default, crew.kickoff() is synchronous — it blocks the calling thread until every agent finishes. That's fine for a CLI script. It's a problem the moment you embed CrewAI inside a FastAPI endpoint, a background worker, or a pipeline that needs to fan out across multiple inputs simultaneously.

You'll learn:

How kickoff_async and kickoff_for_each_async work under the hood
How to run multiple crews concurrently with asyncio.gather
How to integrate async crews into a FastAPI endpoint without blocking the event loop
The three most common mistakes that silently serialize your "parallel" agents

Time: 20 min | Difficulty: Intermediate

Why Synchronous Kickoff Hurts at Scale

Each CrewAI agent round-trips to an LLM API. A crew with 3 agents and 2 tasks each makes roughly 6 LLM calls in sequence. At ~2s per call, that's ~12s of blocking I/O.

If you're serving 10 concurrent API requests, the 10th user waits 120s — not because your agents are slow, but because you're not using async I/O.

Symptoms:

FastAPI endpoint response times spike under load
asyncio.run() raises RuntimeError: This event loop is already running
CPU sits idle while crews wait on LLM responses
Spawning threads with ThreadPoolExecutor to work around blocking calls

Solution

Step 1: Install the Right Version

kickoff_async was stabilized in CrewAI 0.70.0. Confirm you're on a compatible version.

# Install or upgrade crewai
pip install "crewai>=0.70.0"

# Verify
python -c "import crewai; print(crewai.__version__)"

Expected output:

0.80.0

Step 2: Replace `kickoff` With `kickoff_async`

kickoff_async is a coroutine — call it with await inside an async function.

import asyncio
from crewai import Agent, Crew, Task, Process
from crewai_tools import SerperDevTool

# Define a minimal crew for the examples below
def build_research_crew(topic: str) -> Crew:
    researcher = Agent(
        role="Research Analyst",
        goal=f"Find the latest developments on {topic}",
        backstory="You specialize in concise technical research.",
        tools=[SerperDevTool()],
        verbose=False,
    )

    task = Task(
        description=f"Research {topic} and summarize findings in 3 bullet points.",
        expected_output="3 bullet points, each under 20 words.",
        agent=researcher,
    )

    return Crew(
        agents=[researcher],
        tasks=[task],
        process=Process.sequential,
        verbose=False,
    )


async def run_single_async():
    crew = build_research_crew("LangGraph 0.3 release")

    # kickoff_async returns a CrewOutput coroutine — await it
    result = await crew.kickoff_async()
    print(result.raw)


asyncio.run(run_single_async())

Expected output:

- LangGraph 0.3 introduces native streaming support for agent state updates
- New checkpoint backend reduces memory usage by ~40% in long-running graphs
- Breaking change: StateGraph.compile() now requires explicit interrupt config

The crew runs entirely within the existing event loop. No thread blocking.

Step 3: Run Multiple Crews Concurrently With `asyncio.gather`

This is where async pays off. asyncio.gather runs all coroutines concurrently — LLM I/O from one crew overlaps with I/O from another.

async def run_parallel_crews():
    topics = [
        "CrewAI 0.80 changelog",
        "LangGraph vs CrewAI 2026",
        "AutoGen 0.5 new features",
    ]

    # Build all crews upfront — crew creation is synchronous and cheap
    crews = [build_research_crew(topic) for topic in topics]

    # Fire all three kickoffs concurrently
    # asyncio.gather preserves order: results[0] matches topics[0]
    results = await asyncio.gather(
        *[crew.kickoff_async() for crew in crews]
    )

    for topic, result in zip(topics, results):
        print(f"\n=== {topic} ===")
        print(result.raw)


asyncio.run(run_parallel_crews())

All three crews make their LLM calls in parallel. Wall-clock time drops from ~3× single-crew time to ~1.2× (overhead only).

Step 4: Use `kickoff_for_each_async` for Input Arrays

When you have one crew design but N different inputs, kickoff_for_each_async handles fan-out without manual gather.

async def run_for_each():
    crew = build_research_crew("{topic}")  # placeholder — inputs override at runtime

    inputs_list = [
        {"topic": "Ollama 0.6 release"},
        {"topic": "vLLM v0.7 performance"},
        {"topic": "llama.cpp GGUF v4"},
    ]

    # Runs one crew instance per input dict, concurrently
    results = await crew.kickoff_for_each_async(inputs=inputs_list)

    for inputs, result in zip(inputs_list, results):
        print(f"\n--- {inputs['topic']} ---")
        print(result.raw)


asyncio.run(run_for_each())

kickoff_for_each_async is cleaner than manual gather when all inputs share the same crew blueprint. It also handles exceptions per-input rather than failing the whole batch.

Step 5: Integrate Into a FastAPI Endpoint

Never call asyncio.run() inside a FastAPI route — FastAPI already runs an event loop. Use await directly.

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()


class ResearchRequest(BaseModel):
    topics: list[str]


class ResearchResponse(BaseModel):
    results: dict[str, str]


@app.post("/research", response_model=ResearchResponse)
async def research_endpoint(request: ResearchRequest):
    # Build crews for each topic
    crews = [build_research_crew(topic) for topic in request.topics]

    # Await all concurrently — event loop stays unblocked for other requests
    raw_results = await asyncio.gather(
        *[crew.kickoff_async() for crew in crews]
    )

    return ResearchResponse(
        results={
            topic: result.raw
            for topic, result in zip(request.topics, raw_results)
        }
    )

# Test it
curl -X POST http://localhost:8000/research \
  -H "Content-Type: application/json" \
  -d '{"topics": ["CrewAI async", "LangGraph streaming"]}'

Expected: JSON response with both topics researched, both LLM calls made in parallel.

Verification

Run this benchmark to confirm parallel execution is actually faster than sequential.

import asyncio
import time
from crewai import Agent, Crew, Task, Process


def build_timing_crew(label: str) -> Crew:
    agent = Agent(
        role="Timer",
        goal=f"Say '{label} done' and nothing else.",
        backstory="You respond exactly as instructed.",
        verbose=False,
    )
    task = Task(
        description=f"Reply with exactly: {label} done",
        expected_output=f"{label} done",
        agent=agent,
    )
    return Crew(agents=[agent], tasks=[task], process=Process.sequential, verbose=False)


async def benchmark():
    crews = [build_timing_crew(f"crew-{i}") for i in range(4)]

    start = time.perf_counter()
    await asyncio.gather(*[c.kickoff_async() for c in crews])
    parallel_time = time.perf_counter() - start

    crews2 = [build_timing_crew(f"crew-{i}") for i in range(4)]
    start = time.perf_counter()
    for c in crews2:
        await c.kickoff_async()  # sequential awaits
    sequential_time = time.perf_counter() - start

    print(f"Parallel:   {parallel_time:.1f}s")
    print(f"Sequential: {sequential_time:.1f}s")
    print(f"Speedup:    {sequential_time / parallel_time:.1f}x")


asyncio.run(benchmark())

You should see: Parallel time roughly equal to the slowest single crew, not the sum of all four.

Three Pitfalls That Silently Serialize Your Crews

Pitfall 1: Sharing a single Agent instance across crews

CrewAI agents maintain internal state between tasks. If you reuse the same Agent object in multiple concurrent crews, they overwrite each other's context.

# ❌ Wrong — shared agent, race condition on internal state
shared_agent = Agent(role="Researcher", ...)
crew_a = Crew(agents=[shared_agent], ...)
crew_b = Crew(agents=[shared_agent], ...)  # will corrupt crew_a's state

# ✅ Correct — fresh agent per crew
crew_a = Crew(agents=[Agent(role="Researcher", ...)], ...)
crew_b = Crew(agents=[Agent(role="Researcher", ...)], ...)

Pitfall 2: Calling asyncio.run() inside an existing event loop

# ❌ Wrong — raises RuntimeError in FastAPI or Jupyter
result = asyncio.run(crew.kickoff_async())

# ✅ Correct — inside an async context, just await
result = await crew.kickoff_async()

If you're in Jupyter, use await directly or nest_asyncio.apply() at the top of the notebook.

Pitfall 3: Using kickoff_for_each (sync) instead of kickoff_for_each_async

# ❌ Wrong — synchronous, blocks for N × crew_time
results = crew.kickoff_for_each(inputs=inputs_list)

# ✅ Correct — concurrent, wall time ≈ max single crew time
results = await crew.kickoff_for_each_async(inputs=inputs_list)

The synchronous version exists for simple scripts. It has no place in a web server or concurrent pipeline.

What You Learned

kickoff_async is a drop-in async replacement for kickoff — always use it inside async contexts
asyncio.gather runs crews concurrently; LLM I/O from each crew overlaps
kickoff_for_each_async is the idiomatic way to fan out one crew blueprint across N inputs
Never share Agent instances across concurrent crews — create fresh agents per crew
In FastAPI, use await directly; never call asyncio.run() inside a route

When NOT to use async kickoff: Single-crew CLI scripts where simplicity matters more than performance. crew.kickoff() is fine there. Switch to async the moment you embed CrewAI into a server or need parallel execution.

Tested on CrewAI 0.80.0, Python 3.12, FastAPI 0.115, Ubuntu 24.04 and macOS Sequoia