Problem: Market Research Takes Days. It Should Take Minutes.
Manually tracking competitor pricing, reading industry reports, and synthesizing trends across dozens of sources is exhausting—and slow. A Swarm AI system delegates this to specialized agents that work in parallel, returning a structured report in minutes.
You'll learn:
- What a Swarm AI is and when to use one over a single agent
- How to build a working multi-agent research pipeline in Python
- How to route tasks, aggregate results, and export a clean report
Time: 45 min | Level: Intermediate
Why This Happens
Single LLM agents fail at market research because the task is too broad. One agent trying to scrape, analyze sentiment, compare pricing, and write a report will lose context, hallucinate, or time out.
Swarm AI solves this with specialization. Each agent owns one job. A coordinator routes tasks and collects outputs. The result is faster, more accurate, and easier to debug.
Common problems with single-agent research:
- Context window overflow on large datasets
- No parallelism—tasks run sequentially
- Hard to isolate which step produced bad output
Solution
Step 1: Set Up Your Environment
Install the required packages.
pip install openai langchain langchain-community duckduckgo-search python-dotenv
Create your project structure:
mkdir swarm-research && cd swarm-research
touch main.py agents.py coordinator.py .env
Add your API key to .env:
OPENAI_API_KEY=your_key_here
Expected: No errors. Python 3.11+ required.
If it fails:
pip: command not found: Usepip3orpython -m pipModuleNotFoundErroron run: Re-run pip install—virtual env may be active
Step 2: Define Your Specialized Agents
Each agent is a focused function that accepts a topic and returns structured text. Keep each agent under 200 lines of context for best results.
# agents.py
from langchain_community.tools import DuckDuckGoSearchRun
from openai import OpenAI
client = OpenAI()
search = DuckDuckGoSearchRun()
def search_agent(topic: str) -> str:
"""Collects raw search results for a topic."""
results = search.run(f"{topic} market 2026")
return results
def analyst_agent(raw_data: str, focus: str) -> str:
"""Extracts key insights from raw search data."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
f"You are a market analyst. Extract {focus} insights only. "
"Be concise. Return bullet points."
)},
{"role": "user", "content": raw_data}
],
max_tokens=500
)
return response.choices[0].message.content
def sentiment_agent(raw_data: str) -> str:
"""Scores market sentiment from 1 (bearish) to 10 (bullish)."""
response = client.chat.completions.create(
model="gpt-4o-mini", # Cheaper for simple scoring tasks
messages=[
{"role": "system", "content": (
"Analyze market sentiment. Reply with: "
"Score: X/10\nReason: one sentence."
)},
{"role": "user", "content": raw_data}
],
max_tokens=100
)
return response.choices[0].message.content
def writer_agent(findings: dict) -> str:
"""Synthesizes all agent outputs into a final report."""
combined = "\n\n".join([f"### {k}\n{v}" for k, v in findings.items()])
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"You are a business writer. Synthesize these research findings "
"into a structured executive summary with sections: "
"Overview, Key Trends, Sentiment, Recommendations."
)},
{"role": "user", "content": combined}
],
max_tokens=800
)
return response.choices[0].message.content
Why gpt-4o-mini for sentiment: Simple scoring tasks don't need a large model. Save cost where you can.
Step 3: Build the Coordinator
The coordinator is the brain of the swarm. It runs agents, can parallelize them with threads, and collects results into a single dict.
# coordinator.py
import concurrent.futures
from agents import search_agent, analyst_agent, sentiment_agent, writer_agent
def run_swarm(topic: str) -> str:
print(f"[Swarm] Starting research on: {topic}")
# Step 1: Search agent gathers raw data
raw_data = search_agent(topic)
print("[Swarm] Search complete.")
# Step 2: Run analyst and sentiment agents IN PARALLEL
# Both receive the same raw data but process different aspects
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
analyst_future = executor.submit(
analyst_agent, raw_data, "pricing and competitor positioning"
)
sentiment_future = executor.submit(sentiment_agent, raw_data)
analyst_output = analyst_future.result()
sentiment_output = sentiment_future.result()
print("[Swarm] Analysis complete.")
# Step 3: Writer agent synthesizes everything
findings = {
"Raw Market Data": raw_data,
"Analyst Insights": analyst_output,
"Sentiment Score": sentiment_output,
}
report = writer_agent(findings)
return report
Why ThreadPoolExecutor: The analyst and sentiment agents are independent—running them in parallel cuts wait time roughly in half.
Step 4: Wire Up the Entry Point
# main.py
import os
from dotenv import load_dotenv
from coordinator import run_swarm
load_dotenv()
def main():
topic = input("Enter a market or industry to research: ").strip()
if not topic:
print("No topic entered. Exiting.")
return
report = run_swarm(topic)
# Save the report
filename = f"report_{topic.lower().replace(' ', '_')}.md"
with open(filename, "w") as f:
f.write(f"# Market Research Report: {topic}\n\n")
f.write(report)
print(f"\n[Done] Report saved to {filename}")
print("\n--- REPORT PREVIEW ---\n")
print(report[:500] + "...")
if __name__ == "__main__":
main()
Verification
Run the system end-to-end:
python main.py
Enter a topic when prompted, e.g., AI code editors.
You should see:
[Swarm] Starting research on: AI code editors
[Swarm] Search complete.
[Swarm] Analysis complete.
[Done] Report saved to report_ai_code_editors.md
The output file should contain an executive summary with sections for Overview, Key Trends, Sentiment, and Recommendations.
If it fails:
AuthenticationError: Check your.envkey and thatload_dotenv()runs before the OpenAI client initializesRateLimitError: Addtime.sleep(1)between agent calls or switch to a paid tier- Timeout on search: DuckDuckGo occasionally rate-limits—retry after 30 seconds
Extending the Swarm
Once the basic swarm works, three high-value extensions are worth adding.
Add more specialized agents by creating new functions in agents.py and calling them in the coordinator. A pricing_agent that extracts only price data, or a risk_agent that flags regulatory risks, follows the exact same pattern.
Add memory between runs by saving each report to a JSON store and passing previous findings to the writer agent. This lets the swarm track changes over time rather than generating one-off snapshots.
Run on a schedule using a cron job or GitHub Actions to regenerate reports daily. Pair this with a Slack webhook to push summaries automatically.
What You Learned
- Swarms outperform single agents by assigning one job per agent and running independent tasks in parallel
- The coordinator pattern keeps orchestration logic separate from task logic—easier to test and extend
- Use smaller models (gpt-4o-mini) for simple classification or scoring to reduce cost
Limitation: This swarm is stateless—each run starts from scratch. For ongoing competitive monitoring, add a persistence layer.
When NOT to use this: If your research task fits in one prompt with a 128k context model, a swarm is overkill. Use swarms when tasks are truly parallel and independent.
Tested on Python 3.12, OpenAI SDK 1.x, LangChain 0.3+, macOS & Ubuntu 24.04