Problem: CrewAI Returns Raw Strings You Can't Reliably Parse
Your agent runs, produces a result, and then you write three lines of brittle string manipulation to extract the data you actually wanted. Then the model rephrases its output and your parser breaks.
You'll learn:
- How to attach a Pydantic model to a CrewAI
Taskusingoutput_pydantic - How to access the validated result directly from
task.output.pydantic - How to handle validation errors without crashing your crew
Time: 15 min | Difficulty: Intermediate
Why Raw String Output Breaks in Production
By default, task.output.raw is a string. The LLM decides its own format. On one run you get:
Name: Acme Corp
Revenue: $4.2M
Founded: 2018
On the next:
{"company": "Acme Corp", "revenue": "4.2M", "founded": 2018}
Both are "correct" from the model's perspective. Neither is safe to parse without a schema. CrewAI's output_pydantic field solves this by instructing the task to validate its output against a Pydantic model before returning.
Symptoms of the problem:
KeyErrororAttributeErrorwhen accessing agent results downstream- Inconsistent field names across runs (
revenuevsannual_revenue) - Having to prompt-engineer JSON format into every task description
Solution
Step 1: Install Dependencies
# Requires crewai >= 0.28.0 and pydantic v2
pip install "crewai[tools]>=0.28.0" pydantic
Verify:
python -c "import crewai; print(crewai.__version__)"
# Expected: 0.28.0 or higher
Step 2: Define Your Pydantic Output Model
from pydantic import BaseModel, Field
from typing import Optional
class CompanyResearch(BaseModel):
name: str = Field(description="Legal company name")
founded: int = Field(description="Year founded as integer")
revenue_usd_millions: float = Field(description="Annual revenue in USD millions")
headquarters: str = Field(description="City, Country")
summary: str = Field(description="2-sentence company overview")
competitors: list[str] = Field(
default_factory=list,
description="Top 3 direct competitors by name"
)
Keep field descriptions precise — CrewAI passes them to the LLM as formatting instructions.
Step 3: Attach the Model to a Task
from crewai import Agent, Task, Crew, LLM
llm = LLM(model="openai/gpt-4o-mini")
researcher = Agent(
role="Company Research Analyst",
goal="Extract accurate company data from public sources",
backstory="You specialize in structured business intelligence.",
llm=llm,
verbose=True,
)
research_task = Task(
description=(
"Research {company_name} and return structured data. "
"Use publicly available information only."
),
expected_output="Structured company profile with all required fields populated.",
output_pydantic=CompanyResearch, # <-- this is the key line
agent=researcher,
)
output_pydantic tells CrewAI to:
- Append JSON formatting instructions to the task prompt automatically
- Parse the raw output string as JSON after the task completes
- Validate it against
CompanyResearchusing Pydantic - Raise a
ValidationErrorif required fields are missing or wrong type
Step 4: Run the Crew and Access Typed Output
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True,
)
result = crew.kickoff(inputs={"company_name": "Stripe"})
# Access the validated Pydantic object directly
company: CompanyResearch = research_task.output.pydantic
print(company.name) # "Stripe"
print(company.founded) # 2010 (int, not "2010")
print(company.revenue_usd_millions) # 4200.0
print(company.competitors) # ["Braintree", "Adyen", "Square"]
# Or serialize for downstream use
print(company.model_dump())
print(company.model_dump_json(indent=2))
Step 5: Handle Validation Failures Gracefully
The LLM occasionally returns malformed JSON or omits a required field. Wrap crew execution:
from pydantic import ValidationError
try:
result = crew.kickoff(inputs={"company_name": "Stripe"})
company = research_task.output.pydantic
if company is None:
# Output parsed but Pydantic validation failed — raw string is still available
print("Structured parse failed. Raw output:")
print(research_task.output.raw)
else:
process(company)
except ValidationError as e:
# Pydantic schema mismatch — log fields that failed
print(f"Validation errors: {e.error_count()}")
for err in e.errors():
print(f" Field '{err['loc']}': {err['msg']}")
If pydantic is None after a successful run: the model returned text it believed was valid JSON but wasn't. Add model_config = ConfigDict(strict=False) to your Pydantic class to allow coercion (e.g., "2010" → 2010).
Step 6: Use Structured Output Across Multi-Agent Crews
Each task in a crew can have its own output model. Pass structured data between tasks using context:
class EnrichmentResult(BaseModel):
company: str
tech_stack: list[str]
hiring: bool
latest_funding_round: Optional[str] = None
enrich_task = Task(
description=(
"Using the company data provided in context, "
"identify the tech stack and hiring status of {company_name}."
),
expected_output="Enriched company profile with tech and hiring data.",
output_pydantic=EnrichmentResult,
agent=enricher,
context=[research_task], # pulls research_task.output into this task's prompt
)
Downstream tasks receive the full serialized JSON of research_task.output.raw as context — which is clean because output_pydantic forced valid JSON on the upstream task.
Verification
crew = Crew(agents=[researcher, enricher], tasks=[research_task, enrich_task])
crew.kickoff(inputs={"company_name": "Linear"})
# Both should be non-None
assert research_task.output.pydantic is not None
assert enrich_task.output.pydantic is not None
# Type checks pass
assert isinstance(research_task.output.pydantic, CompanyResearch)
assert isinstance(enrich_task.output.pydantic, EnrichmentResult)
print("All structured outputs validated ✓")
You should see: No assertion errors and both Pydantic objects accessible with full type hints in your IDE.
What You Learned
output_pydanticon aTaskenforces a schema without any manual parsing- Access the result via
task.output.pydantic— it's a real Pydantic object, not a dict task.output.rawis always available as fallback if validation fails- Multi-task crews benefit most: structured upstream output becomes clean context for downstream tasks
Limitation: output_pydantic adds ~50–100 tokens of JSON schema instructions to each task prompt. For token-sensitive deployments, use output_json with a plain dict schema instead — it's lighter but loses Pydantic validation.
Tested on CrewAI 0.80.0, Pydantic 2.7, Python 3.12, gpt-4o-mini and claude-3-5-haiku