Build a Swarm AI System for Market Research in 45 Minutes

Use multi-agent AI swarms to automate market research—scrape, analyze, and synthesize competitor data 10x faster than manual methods.

Problem: Market Research Takes Days. It Should Take Minutes.

Manually tracking competitor pricing, reading industry reports, and synthesizing trends across dozens of sources is exhausting—and slow. A Swarm AI system delegates this to specialized agents that work in parallel, returning a structured report in minutes.

You'll learn:

  • What a Swarm AI is and when to use one over a single agent
  • How to build a working multi-agent research pipeline in Python
  • How to route tasks, aggregate results, and export a clean report

Time: 45 min | Level: Intermediate


Why This Happens

Single LLM agents fail at market research because the task is too broad. One agent trying to scrape, analyze sentiment, compare pricing, and write a report will lose context, hallucinate, or time out.

Swarm AI solves this with specialization. Each agent owns one job. A coordinator routes tasks and collects outputs. The result is faster, more accurate, and easier to debug.

Common problems with single-agent research:

  • Context window overflow on large datasets
  • No parallelism—tasks run sequentially
  • Hard to isolate which step produced bad output

Solution

Step 1: Set Up Your Environment

Install the required packages.

pip install openai langchain langchain-community duckduckgo-search python-dotenv

Create your project structure:

mkdir swarm-research && cd swarm-research
touch main.py agents.py coordinator.py .env

Add your API key to .env:

OPENAI_API_KEY=your_key_here

Expected: No errors. Python 3.11+ required.

If it fails:

  • pip: command not found: Use pip3 or python -m pip
  • ModuleNotFoundError on run: Re-run pip install—virtual env may be active

Step 2: Define Your Specialized Agents

Each agent is a focused function that accepts a topic and returns structured text. Keep each agent under 200 lines of context for best results.

# agents.py
from langchain_community.tools import DuckDuckGoSearchRun
from openai import OpenAI

client = OpenAI()
search = DuckDuckGoSearchRun()

def search_agent(topic: str) -> str:
    """Collects raw search results for a topic."""
    results = search.run(f"{topic} market 2026")
    return results

def analyst_agent(raw_data: str, focus: str) -> str:
    """Extracts key insights from raw search data."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                f"You are a market analyst. Extract {focus} insights only. "
                "Be concise. Return bullet points."
            )},
            {"role": "user", "content": raw_data}
        ],
        max_tokens=500
    )
    return response.choices[0].message.content

def sentiment_agent(raw_data: str) -> str:
    """Scores market sentiment from 1 (bearish) to 10 (bullish)."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Cheaper for simple scoring tasks
        messages=[
            {"role": "system", "content": (
                "Analyze market sentiment. Reply with: "
                "Score: X/10\nReason: one sentence."
            )},
            {"role": "user", "content": raw_data}
        ],
        max_tokens=100
    )
    return response.choices[0].message.content

def writer_agent(findings: dict) -> str:
    """Synthesizes all agent outputs into a final report."""
    combined = "\n\n".join([f"### {k}\n{v}" for k, v in findings.items()])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a business writer. Synthesize these research findings "
                "into a structured executive summary with sections: "
                "Overview, Key Trends, Sentiment, Recommendations."
            )},
            {"role": "user", "content": combined}
        ],
        max_tokens=800
    )
    return response.choices[0].message.content

Why gpt-4o-mini for sentiment: Simple scoring tasks don't need a large model. Save cost where you can.


Step 3: Build the Coordinator

The coordinator is the brain of the swarm. It runs agents, can parallelize them with threads, and collects results into a single dict.

# coordinator.py
import concurrent.futures
from agents import search_agent, analyst_agent, sentiment_agent, writer_agent

def run_swarm(topic: str) -> str:
    print(f"[Swarm] Starting research on: {topic}")

    # Step 1: Search agent gathers raw data
    raw_data = search_agent(topic)
    print("[Swarm] Search complete.")

    # Step 2: Run analyst and sentiment agents IN PARALLEL
    # Both receive the same raw data but process different aspects
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        analyst_future = executor.submit(
            analyst_agent, raw_data, "pricing and competitor positioning"
        )
        sentiment_future = executor.submit(sentiment_agent, raw_data)

        analyst_output = analyst_future.result()
        sentiment_output = sentiment_future.result()

    print("[Swarm] Analysis complete.")

    # Step 3: Writer agent synthesizes everything
    findings = {
        "Raw Market Data": raw_data,
        "Analyst Insights": analyst_output,
        "Sentiment Score": sentiment_output,
    }
    report = writer_agent(findings)

    return report

Why ThreadPoolExecutor: The analyst and sentiment agents are independent—running them in parallel cuts wait time roughly in half.


Step 4: Wire Up the Entry Point

# main.py
import os
from dotenv import load_dotenv
from coordinator import run_swarm

load_dotenv()

def main():
    topic = input("Enter a market or industry to research: ").strip()
    if not topic:
        print("No topic entered. Exiting.")
        return

    report = run_swarm(topic)

    # Save the report
    filename = f"report_{topic.lower().replace(' ', '_')}.md"
    with open(filename, "w") as f:
        f.write(f"# Market Research Report: {topic}\n\n")
        f.write(report)

    print(f"\n[Done] Report saved to {filename}")
    print("\n--- REPORT PREVIEW ---\n")
    print(report[:500] + "...")

if __name__ == "__main__":
    main()

Verification

Run the system end-to-end:

python main.py

Enter a topic when prompted, e.g., AI code editors.

You should see:

[Swarm] Starting research on: AI code editors
[Swarm] Search complete.
[Swarm] Analysis complete.
[Done] Report saved to report_ai_code_editors.md

The output file should contain an executive summary with sections for Overview, Key Trends, Sentiment, and Recommendations.

If it fails:

  • AuthenticationError: Check your .env key and that load_dotenv() runs before the OpenAI client initializes
  • RateLimitError: Add time.sleep(1) between agent calls or switch to a paid tier
  • Timeout on search: DuckDuckGo occasionally rate-limits—retry after 30 seconds

Extending the Swarm

Once the basic swarm works, three high-value extensions are worth adding.

Add more specialized agents by creating new functions in agents.py and calling them in the coordinator. A pricing_agent that extracts only price data, or a risk_agent that flags regulatory risks, follows the exact same pattern.

Add memory between runs by saving each report to a JSON store and passing previous findings to the writer agent. This lets the swarm track changes over time rather than generating one-off snapshots.

Run on a schedule using a cron job or GitHub Actions to regenerate reports daily. Pair this with a Slack webhook to push summaries automatically.


What You Learned

  • Swarms outperform single agents by assigning one job per agent and running independent tasks in parallel
  • The coordinator pattern keeps orchestration logic separate from task logic—easier to test and extend
  • Use smaller models (gpt-4o-mini) for simple classification or scoring to reduce cost

Limitation: This swarm is stateless—each run starts from scratch. For ongoing competitive monitoring, add a persistence layer.

When NOT to use this: If your research task fits in one prompt with a 128k context model, a swarm is overkill. Use swarms when tasks are truly parallel and independent.


Tested on Python 3.12, OpenAI SDK 1.x, LangChain 0.3+, macOS & Ubuntu 24.04