Problem: Market Research Takes Days. It Should Take Minutes.

Manually tracking competitor pricing, reading industry reports, and synthesizing trends across dozens of sources is exhausting—and slow. A Swarm AI system delegates this to specialized agents that work in parallel, returning a structured report in minutes.

You'll learn:

What a Swarm AI is and when to use one over a single agent
How to build a working multi-agent research pipeline in Python
How to route tasks, aggregate results, and export a clean report

Time: 45 min | Level: Intermediate

Why This Happens

Single LLM agents fail at market research because the task is too broad. One agent trying to scrape, analyze sentiment, compare pricing, and write a report will lose context, hallucinate, or time out.

Swarm AI solves this with specialization. Each agent owns one job. A coordinator routes tasks and collects outputs. The result is faster, more accurate, and easier to debug.

Common problems with single-agent research:

Context window overflow on large datasets
No parallelism—tasks run sequentially
Hard to isolate which step produced bad output

Solution

Step 1: Set Up Your Environment

Install the required packages.

pip install openai langchain langchain-community duckduckgo-search python-dotenv

Create your project structure:

mkdir swarm-research && cd swarm-research
touch main.py agents.py coordinator.py .env

Add your API key to .env:

OPENAI_API_KEY=your_key_here

Expected: No errors. Python 3.11+ required.

If it fails:

pip: command not found: Use pip3 or python -m pip
ModuleNotFoundError on run: Re-run pip install—virtual env may be active

Step 2: Define Your Specialized Agents

Each agent is a focused function that accepts a topic and returns structured text. Keep each agent under 200 lines of context for best results.

# agents.py
from langchain_community.tools import DuckDuckGoSearchRun
from openai import OpenAI

client = OpenAI()
search = DuckDuckGoSearchRun()

def search_agent(topic: str) -> str:
    """Collects raw search results for a topic."""
    results = search.run(f"{topic} market 2026")
    return results

def analyst_agent(raw_data: str, focus: str) -> str:
    """Extracts key insights from raw search data."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                f"You are a market analyst. Extract {focus} insights only. "
                "Be concise. Return bullet points."
            )},
            {"role": "user", "content": raw_data}
        ],
        max_tokens=500
    )
    return response.choices[0].message.content

def sentiment_agent(raw_data: str) -> str:
    """Scores market sentiment from 1 (bearish) to 10 (bullish)."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Cheaper for simple scoring tasks
        messages=[
            {"role": "system", "content": (
                "Analyze market sentiment. Reply with: "
                "Score: X/10\nReason: one sentence."
            )},
            {"role": "user", "content": raw_data}
        ],
        max_tokens=100
    )
    return response.choices[0].message.content

def writer_agent(findings: dict) -> str:
    """Synthesizes all agent outputs into a final report."""
    combined = "\n\n".join([f"### {k}\n{v}" for k, v in findings.items()])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a business writer. Synthesize these research findings "
                "into a structured executive summary with sections: "
                "Overview, Key Trends, Sentiment, Recommendations."
            )},
            {"role": "user", "content": combined}
        ],
        max_tokens=800
    )
    return response.choices[0].message.content

Why gpt-4o-mini for sentiment: Simple scoring tasks don't need a large model. Save cost where you can.

Step 3: Build the Coordinator

The coordinator is the brain of the swarm. It runs agents, can parallelize them with threads, and collects results into a single dict.

# coordinator.py
import concurrent.futures
from agents import search_agent, analyst_agent, sentiment_agent, writer_agent

def run_swarm(topic: str) -> str:
    print(f"[Swarm] Starting research on: {topic}")

    # Step 1: Search agent gathers raw data
    raw_data = search_agent(topic)
    print("[Swarm] Search complete.")

    # Step 2: Run analyst and sentiment agents IN PARALLEL
    # Both receive the same raw data but process different aspects
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        analyst_future = executor.submit(
            analyst_agent, raw_data, "pricing and competitor positioning"
        )
        sentiment_future = executor.submit(sentiment_agent, raw_data)

        analyst_output = analyst_future.result()
        sentiment_output = sentiment_future.result()

    print("[Swarm] Analysis complete.")

    # Step 3: Writer agent synthesizes everything
    findings = {
        "Raw Market Data": raw_data,
        "Analyst Insights": analyst_output,
        "Sentiment Score": sentiment_output,
    }
    report = writer_agent(findings)

    return report

Why ThreadPoolExecutor: The analyst and sentiment agents are independent—running them in parallel cuts wait time roughly in half.

Step 4: Wire Up the Entry Point

# main.py
import os
from dotenv import load_dotenv
from coordinator import run_swarm

load_dotenv()

def main():
    topic = input("Enter a market or industry to research: ").strip()
    if not topic:
        print("No topic entered. Exiting.")
        return

    report = run_swarm(topic)

    # Save the report
    filename = f"report_{topic.lower().replace(' ', '_')}.md"
    with open(filename, "w") as f:
        f.write(f"# Market Research Report: {topic}\n\n")
        f.write(report)

    print(f"\n[Done] Report saved to {filename}")
    print("\n--- REPORT PREVIEW ---\n")
    print(report[:500] + "...")

if __name__ == "__main__":
    main()

Verification

Run the system end-to-end:

python main.py

Enter a topic when prompted, e.g., AI code editors.

You should see:

[Swarm] Starting research on: AI code editors
[Swarm] Search complete.
[Swarm] Analysis complete.
[Done] Report saved to report_ai_code_editors.md

The output file should contain an executive summary with sections for Overview, Key Trends, Sentiment, and Recommendations.

If it fails:

AuthenticationError: Check your .env key and that load_dotenv() runs before the OpenAI client initializes
RateLimitError: Add time.sleep(1) between agent calls or switch to a paid tier
Timeout on search: DuckDuckGo occasionally rate-limits—retry after 30 seconds

Extending the Swarm

Once the basic swarm works, three high-value extensions are worth adding.

Add more specialized agents by creating new functions in agents.py and calling them in the coordinator. A pricing_agent that extracts only price data, or a risk_agent that flags regulatory risks, follows the exact same pattern.

Add memory between runs by saving each report to a JSON store and passing previous findings to the writer agent. This lets the swarm track changes over time rather than generating one-off snapshots.

Run on a schedule using a cron job or GitHub Actions to regenerate reports daily. Pair this with a Slack webhook to push summaries automatically.

What You Learned

Swarms outperform single agents by assigning one job per agent and running independent tasks in parallel
The coordinator pattern keeps orchestration logic separate from task logic—easier to test and extend
Use smaller models (gpt-4o-mini) for simple classification or scoring to reduce cost

Limitation: This swarm is stateless—each run starts from scratch. For ongoing competitive monitoring, add a persistence layer.

When NOT to use this: If your research task fits in one prompt with a 128k context model, a swarm is overkill. Use swarms when tasks are truly parallel and independent.

Tested on Python 3.12, OpenAI SDK 1.x, LangChain 0.3+, macOS & Ubuntu 24.04