Build an Autonomous AI Agent to Manage Your Inbox in 30 Minutes

Use LangChain and Gmail API to build an AI agent that reads, categorizes, and drafts replies to your emails automatically.

Problem: Your Inbox Is Managing You

You spend 2+ hours a day triaging email — reading, labeling, drafting replies you never send. An autonomous AI agent can do the boring 80% of that work, leaving you with only the decisions that need a human.

You'll learn:

  • How to wire a LangChain agent to Gmail via the Google API
  • How to give the agent tools for reading, labeling, and drafting replies
  • How to keep the agent from doing anything you didn't authorize

Time: 30 min | Level: Intermediate


Why This Happens

Most "AI email" tools wrap a single LLM call around a static prompt. That works for summarizing one message — it breaks down the moment the task requires multiple steps (read thread → check calendar → draft reply → apply label).

An agent is different. It has tools it can call in sequence, memory across steps, and a loop that keeps running until the task is done. For inbox management, this matters.

Common symptoms when you try to skip the agent architecture:

  • LLM hallucinates email content it was never shown
  • Single-prompt approach can't handle "reply only if I haven't responded in 48 hours"
  • No way to attach actions (labeling, archiving) to the LLM output

Solution

Step 1: Set Up Gmail API Access

First, enable the Gmail API in Google Cloud Console and download your credentials.json.

pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client

Then authenticate and get a service object you'll pass to your tools:

from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
import os

SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.modify",   # needed for labels
    "https://www.googleapis.com/auth/gmail.compose",  # needed for drafts
]

def get_gmail_service():
    creds = None
    if os.path.exists("token.json"):
        creds = Credentials.from_authorized_user_file("token.json", SCOPES)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file("credentials.json", SCOPES)
            creds = flow.run_local_server(port=0)
        with open("token.json", "w") as f:
            f.write(creds.to_json())

    return build("gmail", "v1", credentials=creds)

Expected: Running this for the first time opens a browser OAuth flow and saves token.json. Subsequent runs skip the browser.

If it fails:

  • Error: redirect_uri_mismatch: Add http://localhost to authorized redirect URIs in GCP console
  • Scopes not granted: Delete token.json and re-authenticate

Step 2: Build the Agent Tools

LangChain agents work by calling tools you define. Each tool is a Python function wrapped with a description — the LLM reads the description to decide when to call it.

from langchain.tools import tool
from langchain_core.messages import BaseMessage
import base64
import json

service = get_gmail_service()

@tool
def list_unread_emails(max_results: int = 10) -> str:
    """
    Returns a JSON list of unread emails with id, sender, subject, and snippet.
    Call this first to see what needs attention.
    """
    results = service.users().messages().list(
        userId="me",
        labelIds=["UNREAD"],
        maxResults=max_results
    ).execute()

    messages = results.get("messages", [])
    emails = []
    for msg in messages:
        detail = service.users().messages().get(
            userId="me", id=msg["id"], format="metadata",
            metadataHeaders=["From", "Subject"]
        ).execute()
        headers = {h["name"]: h["value"] for h in detail["payload"]["headers"]}
        emails.append({
            "id": msg["id"],
            "from": headers.get("From", ""),
            "subject": headers.get("Subject", ""),
            "snippet": detail.get("snippet", "")
        })

    return json.dumps(emails, indent=2)


@tool
def read_email_body(message_id: str) -> str:
    """
    Returns the full plain-text body of a single email given its message_id.
    Use this when the snippet isn't enough to understand the email.
    """
    msg = service.users().messages().get(
        userId="me", id=message_id, format="full"
    ).execute()

    def extract_text(payload):
        # Walk MIME parts to find plain text
        if payload.get("mimeType") == "text/plain":
            data = payload["body"].get("data", "")
            return base64.urlsafe_b64decode(data).decode("utf-8", errors="ignore")
        for part in payload.get("parts", []):
            result = extract_text(part)
            if result:
                return result
        return ""

    return extract_text(msg["payload"]) or "(no plain text body)"


@tool
def apply_label(message_id: str, label_name: str) -> str:
    """
    Applies a Gmail label to a message. Creates the label if it doesn't exist.
    label_name should be one of: 'needs-reply', 'waiting', 'newsletter', 'archive'.
    """
    # Find or create label
    labels = service.users().labels().list(userId="me").execute().get("labels", [])
    label_id = next((l["id"] for l in labels if l["name"] == label_name), None)

    if not label_id:
        new_label = service.users().labels().create(
            userId="me", body={"name": label_name}
        ).execute()
        label_id = new_label["id"]

    service.users().messages().modify(
        userId="me",
        id=message_id,
        body={"addLabelIds": [label_id], "removeLabelIds": ["UNREAD"]}
    ).execute()

    return f"Label '{label_name}' applied and marked as read."


@tool
def create_draft_reply(message_id: str, reply_body: str) -> str:
    """
    Creates a Gmail draft reply to a given message. Does NOT send it.
    The user will review and send it manually. reply_body is plain text.
    """
    original = service.users().messages().get(
        userId="me", id=message_id, format="metadata",
        metadataHeaders=["From", "Subject", "Message-ID", "To"]
    ).execute()

    headers = {h["name"]: h["value"] for h in original["payload"]["headers"]}
    to = headers.get("From", "")
    subject = headers.get("Subject", "")
    if not subject.startswith("Re:"):
        subject = f"Re: {subject}"
    thread_id = original["threadId"]

    # Encode the MIME message
    mime_message = f"To: {to}\nSubject: {subject}\nContent-Type: text/plain\n\n{reply_body}"
    encoded = base64.urlsafe_b64encode(mime_message.encode()).decode()

    draft = service.users().drafts().create(
        userId="me",
        body={"message": {"raw": encoded, "threadId": thread_id}}
    ).execute()

    return f"Draft created with id {draft['id']}. Subject: '{subject}'"

Why create_draft_reply never sends: You never want an autonomous agent auto-sending emails. Always route through drafts so you stay in control.


Step 3: Wire Up the LangChain Agent

from langchain_anthropic import ChatAnthropic
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

llm = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)

tools = [list_unread_emails, read_email_body, apply_label, create_draft_reply]

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an inbox management assistant. Your job:

1. List unread emails
2. Read any that need more context than the snippet provides
3. Apply one label per email: 'needs-reply', 'waiting', 'newsletter', or 'archive'
4. For emails labeled 'needs-reply', create a draft reply

Rules:
- Never send emails. Only create drafts.
- Apply exactly one label per email. 
- Keep draft replies professional and under 150 words.
- If unsure about something, label it 'needs-reply' and leave a draft asking for clarification."""),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=30)

Why max_iterations=30: Prevents runaway loops if the agent gets confused. 30 is plenty for a 10-email batch.


Step 4: Run It

result = executor.invoke({
    "input": "Process my unread emails. Label each one appropriately and draft replies where needed."
})

print(result["output"])

Terminal showing agent tool calls in sequence The agent's reasoning trace — you'll see it calling list → read → label → draft in sequence

Expected output:

Processed 8 emails:
- 3 labeled 'needs-reply' with drafts created
- 2 labeled 'newsletter'
- 2 labeled 'waiting'
- 1 labeled 'archive'

If it fails:

  • ResourceExhausted from Gmail API: You've hit the free tier quota (250 units/second). Add a time.sleep(0.5) between tool calls or enable billing in GCP.
  • Agent loops endlessly: Set max_iterations=15 and add early_stopping_method="generate" to AgentExecutor.
  • LLM refuses to call tools: Check that your ChatAnthropic API key is set via ANTHROPIC_API_KEY env var.

Verification

Run the agent and then check Gmail:

python inbox_agent.py

In Gmail, you should see:

  • Unread count drops to 0
  • Your custom labels appear in the sidebar
  • Drafts folder has new replies waiting for your review

Gmail sidebar showing new agent labels New labels created by the agent — click any to filter


What You Learned

  • LangChain tool-calling agents are better than single LLM prompts for multi-step tasks because they can act, observe results, and adjust
  • Keeping the agent out of the send pathway is a simple but critical safety measure
  • max_iterations is your circuit breaker — always set it

Limitation: This agent processes emails sequentially. For 100+ unread messages, run it in batches of 10-15 or add a filter to only process emails from the last 24 hours.

When NOT to use this: If your inbox includes sensitive legal, financial, or medical correspondence, label-only mode (disable create_draft_reply) is safer until you've validated the agent's judgment on your specific email patterns.


Tested on Python 3.12, LangChain 0.3, langchain-anthropic 0.3, Gmail API v1, macOS & Ubuntu 24.04