Gemini 3.1 Pro: Use the 2M Context Window for Data Analysis

Problem: Your Dataset Is Too Big for Most AI Tools

You have a 500k-row CSV, a sprawling codebase, or thousands of customer support tickets — and every AI tool you've tried either truncates the data or forces you to chunk it manually. The insights you need require seeing everything at once.

Gemini 3.1 Pro's 2M token context window changes that. You can load entire datasets, multi-file codebases, or hours of transcripts in a single prompt and ask questions across all of it.

You'll learn:

How to estimate token count before sending large payloads
How to structure prompts that get accurate answers from massive inputs
How to extract structured outputs (JSON, tables) from unstructured bulk data

Time: 20 min | Level: Intermediate

Why This Matters

Most LLMs cap out at 128k–200k tokens. Gemini 3.1 Pro's 2M token window (roughly 1.5 million words) means you can fit an entire novel, a year of logs, or a large database export into a single API call — no chunking, no vector databases, no retrieval pipelines for many use cases.

Common use cases:

Querying large CSVs or JSON exports without a database
Auditing an entire codebase for security issues or patterns
Summarizing and cross-referencing thousands of documents
Finding anomalies across full log files

The tradeoff: Large contexts cost more tokens and take longer to process. Use this approach when the data genuinely requires holistic analysis, not just keyword search.

Solution

Step 1: Estimate Your Token Count

Before sending, check your payload fits within the limit. A rough rule: 1 token ≈ 4 characters in English. For structured data (CSV, JSON), it's closer to 3 characters per token due to punctuation.

# Quick local estimate — no API call needed
def estimate_tokens(text: str) -> int:
    # Conservative estimate: 3 chars per token for structured data
    return len(text) // 3

with open("dataset.csv", "r") as f:
    content = f.read()

estimated = estimate_tokens(content)
print(f"Estimated tokens: {estimated:,}")
print(f"Fits in 2M window: {estimated < 2_000_000}")

Expected: Token estimate printed. If you're over 1.8M, trim columns you don't need before sending.

If it fails:

MemoryError on large files: Stream the file in chunks to estimate, don't load all at once.
Estimate seems wrong: Use the official google-generativeai SDK's count_tokens() for an exact count (costs nothing — no generation happens).

Step 2: Load and Send the Data

Use the official Google Generative AI SDK. Install it first:

pip install google-generativeai

Then structure your call to keep the data and the instruction clearly separated:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

model = genai.GenerativeModel("gemini-2.0-pro-exp")  # Use latest Pro model

with open("sales_data.csv", "r") as f:
    csv_content = f.read()

# Keep instruction short and explicit — the model reads the data, not you
prompt = f"""You are a data analyst. Below is a CSV of sales records for 2025.

DATA:
{csv_content}

TASK:
1. Identify the top 5 products by total revenue.
2. Flag any months where revenue dropped more than 20% month-over-month.
3. Return results as JSON with keys: top_products, revenue_drops.
"""

response = model.generate_content(prompt)
print(response.text)

Expected: A JSON block with top_products and revenue_drops arrays.

If it fails:

ResourceExhausted error: You've hit rate limits. Add time.sleep(30) and retry.
Truncated response: Add generation_config=genai.GenerationConfig(max_output_tokens=8192) to the model init.
Model hallucinating data: Add "Only use data from the CSV. Do not infer or invent values." to your prompt.

Step 3: Parse the Structured Output

Gemini will often wrap JSON in markdown fences. Strip them before parsing:

import json
import re

def extract_json(text: str) -> dict:
    # Strip markdown fences if present
    cleaned = re.sub(r"```(?:json)?\n?", "", text).strip()
    return json.loads(cleaned)

result = extract_json(response.text)

top_products = result["top_products"]
revenue_drops = result["revenue_drops"]

print(f"Top 5 products: {[p['name'] for p in top_products]}")
print(f"Revenue drops detected: {len(revenue_drops)} months")

Expected: Clean Python dicts ready for further processing or export.

Step 4: Handle Multi-File Inputs

For codebases or multi-document analysis, concatenate files with clear delimiters so the model knows where each file starts and ends:

import os

def load_codebase(directory: str, extensions: list[str]) -> str:
    parts = []
    for root, _, files in os.walk(directory):
        for file in files:
            if any(file.endswith(ext) for ext in extensions):
                path = os.path.join(root, file)
                with open(path, "r", errors="ignore") as f:
                    content = f.read()
                # Clear delimiter — the model uses this to track file context
                parts.append(f"=== FILE: {path} ===\n{content}\n")
    return "\n".join(parts)

codebase = load_codebase("./src", [".py", ".ts", ".go"])
estimated = estimate_tokens(codebase)
print(f"Codebase is ~{estimated:,} tokens")

prompt = f"""Audit the following codebase for SQL injection vulnerabilities.
For each issue found, return: file path, line number (approximate), and severity.

CODEBASE:
{codebase}

Return findings as a JSON array.
"""

response = model.generate_content(prompt)

Expected: A JSON array of vulnerability findings across all files.

Verification

Run a sanity check against known values in your dataset:

# Add a verification question to your prompt
verification_prompt = f"""
{csv_content}

What is the total number of rows in this dataset (excluding the header)?
Also, what is the exact revenue for product 'Widget-A' in March 2025?
"""

check = model.generate_content(verification_prompt)
print(check.text)
# Compare against your ground truth

You should see: Exact values that match what you'd get from pandas or a SQL query. If they don't match, your data may have formatting issues confusing the model.

What You Learned

Token estimation lets you sanity-check payloads before spending on API calls.
Clear data/instruction separation in prompts improves accuracy on large inputs.
Structured output (JSON) with explicit schema instructions makes responses parseable.
Multi-file inputs need delimiters so the model tracks context across files.

Limitation: At full 2M context, latency can reach 30–90 seconds per request. For real-time applications, this isn't the right tool. Use this for batch analysis jobs.

When NOT to use this: If you're running the same query across thousands of separate documents, a vector database + RAG pipeline will be cheaper and faster. The 2M window shines when you need cross-document reasoning — finding patterns, contradictions, or relationships that span the full dataset.

Tested on Gemini 2.0 Pro (Experimental), Python 3.12, google-generativeai 0.8.x, Ubuntu 24.04 & macOS Sequoia