Problem: Running LLM-Generated Code Safely Without Extra Infrastructure
You want your AI app to generate Python, run it, and return real results — not just code blocks the user has to copy-paste. Setting up a container, an execution engine, or a code interpreter service adds complexity and cost.
Gemini 2.0 has code execution built directly into the API as a first-party tool. You enable it with one flag, and the model handles generating, running, and returning output — all inside Google's sandboxed environment.
You'll learn:
- How to enable code execution on
gemini-2.0-flashandgemini-2.0-pro - How to parse execution results and display generated code + output
- Real-world patterns: data analysis, math, file-free computation
Time: 20 min | Difficulty: Intermediate
Why This Matters in 2026
Most "AI coding" tools stop at generation. Code execution closes the loop: the model writes code, runs it, reads the output, and can iterate — all within a single API call. This is the foundation of reliable data analysis agents, auto-graders, and calculation-heavy assistants.
Gemini 2.0's sandbox runs CPython with a standard scientific stack (NumPy, Pandas, Matplotlib, SciPy). It's stateless per request and has no network access — which is exactly what you want for untrusted execution.
How Code Execution Works
When you pass tools=[{"code_execution": {}}], the model can emit a special executable_code part mid-response. Google's infrastructure runs it, captures stdout and any error, then returns an code_execution_result part. The model reads that result and continues generating its final answer.
Your prompt
│
▼
Gemini 2.0 ──generates──▶ executable_code block
│
Google Sandbox (CPython)
│
code_execution_result
│
Gemini reads result ──▶ Final text response
The round-trip is transparent to you — the full content array shows every step.
Solution
Step 1: Install the SDK
pip install google-generativeai
Verify:
python -c "import google.generativeai as genai; print(genai.__version__)"
Expected: 0.8.x or later.
Step 2: Configure the Client and Enable Code Execution
import google.generativeai as genai
import os
# Set your API key — get one at aistudio.google.com
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel(
model_name="gemini-2.0-flash", # or gemini-2.0-pro
tools=[{"code_execution": {}}], # This is all you need
)
The code_execution tool requires no parameters. Google manages the sandbox entirely.
Step 3: Send a Request and Inspect the Full Response
response = model.generate_content(
"Calculate the first 20 Fibonacci numbers and find which ones are prime."
)
# Iterate over all parts — the response contains code, output, AND text
for part in response.candidates[0].content.parts:
if part.executable_code:
print("=== Generated Code ===")
print(part.executable_code.code)
elif part.code_execution_result:
print("=== Execution Output ===")
print(part.code_execution_result.output)
elif part.text:
print("=== Model Response ===")
print(part.text)
Expected output structure:
=== Generated Code ===
def is_prime(n):
if n < 2: return False
...
=== Execution Output ===
Fibonacci primes up to F(20): [2, 3, 5, 13, 89, 233, 1597]
=== Model Response ===
Among the first 20 Fibonacci numbers, 7 are prime: 2, 3, 5, 13, 89, 233, and 1597...
Step 4: Handle Execution Errors Gracefully
The sandbox returns an outcome code you should always check:
import google.generativeai.types as genai_types
for part in response.candidates[0].content.parts:
if part.code_execution_result:
result = part.code_execution_result
# outcome: OUTCOME_OK, OUTCOME_FAILED, OUTCOME_DEADLINE_EXCEEDED
if result.outcome == genai_types.Outcome.OUTCOME_FAILED:
print(f"Execution failed:\n{result.output}")
elif result.outcome == genai_types.Outcome.OUTCOME_DEADLINE_EXCEEDED:
print("Code timed out — sandbox has a 30s limit per execution")
else:
print(f"Output:\n{result.output}")
Common failure reasons:
- Import errors → The sandbox has NumPy, Pandas, Matplotlib, SciPy, but NOT requests, Flask, or third-party packages
- Timeout → Sandbox cap is 30 seconds; avoid training loops or infinite recursion
- Memory → No hard number is published, but treat it like a 512MB budget
Step 5: Real-World Pattern — Data Analysis with Inline Data
Pass data as a string in your prompt. The model will parse it, write the analysis code, run it, and return interpreted results.
csv_data = """date,revenue,units
2026-01-01,12400,310
2026-01-02,9800,245
2026-01-03,15200,380
2026-01-04,11100,278
2026-01-05,18900,472"""
response = model.generate_content(
f"Analyze this sales data and calculate day-over-day revenue growth rates:\n\n{csv_data}"
)
for part in response.candidates[0].content.parts:
if part.executable_code:
print(part.executable_code.code)
elif part.code_execution_result:
print(part.code_execution_result.output)
elif part.text:
print(part.text)
The model will import pandas, parse the CSV string, compute the growth rates, and return the calculation — not just a code block.
Step 6: Multi-Turn Conversation with Code Execution
Code execution works in chat sessions too. The model can iterate: run code, see the output, and refine.
chat = model.start_chat()
# Turn 1 — initial computation
r1 = chat.send_message("Generate 1000 random numbers from a normal distribution and calculate their mean and std.")
# Turn 2 — follow-up, model has context of previous code + output
r2 = chat.send_message("Now plot a histogram of those same numbers using matplotlib and describe the shape.")
for part in r2.candidates[0].content.parts:
if part.executable_code:
print("Code:", part.executable_code.code[:200], "...")
elif part.text:
print("Analysis:", part.text)
Note: each turn's sandbox is independent — variables don't persist between send_message calls. The model uses its context to re-generate setup code when needed.
Verification
Run this end-to-end check:
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-2.0-flash", tools=[{"code_execution": {}}])
response = model.generate_content("What is the sum of all prime numbers below 100?")
found_code = False
found_output = False
for part in response.candidates[0].content.parts:
if part.executable_code:
found_code = True
if part.code_execution_result:
found_output = True
print("Sandbox output:", part.code_execution_result.output.strip())
assert found_code and found_output, "Code execution did not trigger"
print("✅ Code execution working correctly")
You should see:
Sandbox output: 1060
✅ Code execution working correctly
What You Learned
- Enable the sandbox with one line:
tools=[{"code_execution": {}}] - Responses contain three part types:
executable_code,code_execution_result, andtext— always iterate all parts - The sandbox has NumPy, Pandas, Matplotlib, and SciPy but no network and no third-party installs
- Variables don't persist between turns; the model rewrites setup code using chat context
- Always check
result.outcome— silent failures returnOUTCOME_FAILEDwith the traceback inoutput
When NOT to use this: If your use case needs persistent state, file I/O, or custom packages, you need a proper execution environment (e.g., E2B, Modal, or your own container). Code execution is ideal for stateless computation, data analysis, and math — not for building artifacts or running long jobs.
Tested on google-generativeai 0.8.3, gemini-2.0-flash and gemini-2.0-pro, Python 3.12, macOS and Ubuntu 24.04