Problem: Code Reviews Are a Bottleneck — and a Blind Spot
Pull requests pile up. Reviewers miss nitpicks. Security issues slip through because the senior engineer is busy. Automated linters catch style but not logic.
DeepSeek Coder V2 understands code context well enough to flag real problems: off-by-one errors, missing error handling, unsafe SQL, and API misuse — not just formatting.
You'll learn:
- How to run DeepSeek Coder V2 locally via Ollama for review inference
- How to write a Python script that diffs a PR and sends it to the model
- How to wire it all into a GitHub Actions workflow that comments on PRs automatically
Time: 25 min | Difficulty: Intermediate
Why DeepSeek Coder V2
DeepSeek Coder V2 is a 16B MoE model (active params: ~2.4B) trained specifically on code. It outperforms GPT-4o on HumanEval and handles 128K context — large enough for multi-file diffs.
The 16B variant runs on a single 16GB GPU via Ollama. The 236B full model is for those with A100s, but the 16B hits a good accuracy/cost tradeoff for review automation.
What it catches well:
- Null/None dereferences
- Unhandled exceptions and missing error propagation
- SQL injection and XSS patterns
- Logic errors in conditionals
- Missing input validation
- Inconsistent return types
What it misses: Business logic it has no context for, and performance issues that require profiling data.
Architecture Overview
PR opened / updated
│
GitHub Actions trigger
│
git diff HEAD~1 ──▶ review.py ──▶ Ollama (DeepSeek Coder V2)
│
Structured JSON response
│
GitHub PR comment via API
The runner calls Ollama on a self-hosted machine or a cloud VM with a GPU. If you don't have GPU access, the 7B Q4 quantized variant runs on CPU in under 2 minutes per review.
Solution
Step 1: Install Ollama and Pull the Model
# Install Ollama on your review server (Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull DeepSeek Coder V2 16B — Q4_K_M quantization (~9GB)
ollama pull deepseek-coder-v2:16b-lite-instruct-q4_K_M
# Verify it runs
ollama run deepseek-coder-v2:16b-lite-instruct-q4_K_M "What is a race condition?"
Expected output: A coherent explanation of race conditions in under 30 seconds on a 16GB GPU.
No GPU? Use the 7B instead:
ollama pull deepseek-coder-v2:7b-lite-instruct-q4_K_M
Runs on CPU in ~90 seconds per review. Accuracy drops ~15% on complex logic.
Step 2: Write the Review Script
Create scripts/review.py in your repo root:
import subprocess
import sys
import json
import os
import requests
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
MODEL = os.getenv("REVIEW_MODEL", "deepseek-coder-v2:16b-lite-instruct-q4_K_M")
# 4096 chars keeps the diff under the model's optimal attention window
MAX_DIFF_CHARS = 4096
SYSTEM_PROMPT = """You are a senior software engineer conducting a code review.
Analyze the diff and respond ONLY with a JSON object in this exact format:
{
"summary": "one sentence overall assessment",
"issues": [
{
"severity": "critical|warning|info",
"file": "filename",
"line": "line number or range",
"issue": "what is wrong",
"suggestion": "how to fix it"
}
],
"approved": true|false
}
Return no text outside the JSON object."""
def get_diff() -> str:
result = subprocess.run(
["git", "diff", "HEAD~1", "--unified=3", "--no-color"],
capture_output=True,
text=True,
check=True,
)
diff = result.stdout
# Truncate to avoid overwhelming the context window
if len(diff) > MAX_DIFF_CHARS:
diff = diff[:MAX_DIFF_CHARS] + "\n\n[diff truncated — review first 4096 chars]"
return diff
def review_diff(diff: str) -> dict:
payload = {
"model": MODEL,
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Review this diff:\n\n```diff\n{diff}\n```"},
],
"stream": False,
# temperature 0 = deterministic reviews, same diff = same output
"options": {"temperature": 0},
}
response = requests.post(
f"{OLLAMA_URL}/api/chat",
json=payload,
timeout=180, # 3 min max; large diffs on CPU can be slow
)
response.raise_for_status()
content = response.json()["message"]["content"]
# Strip markdown fences if the model wraps JSON anyway
content = content.strip().lstrip("```json").lstrip("```").rstrip("```").strip()
return json.loads(content)
def format_github_comment(review: dict) -> str:
lines = ["## 🤖 DeepSeek Coder V2 Review\n"]
lines.append(f"**Summary:** {review['summary']}\n")
critical = [i for i in review["issues"] if i["severity"] == "critical"]
warnings = [i for i in review["issues"] if i["severity"] == "warning"]
info = [i for i in review["issues"] if i["severity"] == "info"]
if critical:
lines.append("### 🔴 Critical")
for issue in critical:
lines.append(
f"- **`{issue['file']}` line {issue['line']}**: {issue['issue']}\n"
f" > Fix: {issue['suggestion']}"
)
if warnings:
lines.append("\n### 🟡 Warnings")
for issue in warnings:
lines.append(
f"- **`{issue['file']}` line {issue['line']}**: {issue['issue']}\n"
f" > Fix: {issue['suggestion']}"
)
if info:
lines.append("\n### 🔵 Suggestions")
for issue in info:
lines.append(f"- `{issue['file']}` line {issue['line']}: {issue['issue']}")
verdict = "✅ Approved" if review.get("approved") else "❌ Changes requested"
lines.append(f"\n**Verdict:** {verdict}")
lines.append(
"\n---\n*Reviewed by DeepSeek Coder V2 16B · Not a substitute for human review*"
)
return "\n".join(lines)
def post_github_comment(comment: str) -> None:
token = os.environ["GITHUB_TOKEN"]
repo = os.environ["GITHUB_REPOSITORY"]
pr_number = os.environ["PR_NUMBER"]
url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github+json",
}
response = requests.post(url, json={"body": comment}, headers=headers)
response.raise_for_status()
if __name__ == "__main__":
diff = get_diff()
if not diff.strip():
print("No diff found — skipping review.")
sys.exit(0)
review = review_diff(diff)
comment = format_github_comment(review)
post_github_comment(comment)
print("Review posted.")
# Exit 1 if critical issues found — fails the CI check
critical_count = sum(1 for i in review["issues"] if i["severity"] == "critical")
if critical_count > 0:
print(f"Found {critical_count} critical issue(s). Blocking merge.")
sys.exit(1)
Step 3: Configure the GitHub Actions Workflow
Create .github/workflows/ai-review.yml:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
# Only review actual source code, skip docs and config
paths:
- "**.py"
- "**.ts"
- "**.tsx"
- "**.go"
- "**.rs"
jobs:
review:
name: DeepSeek Coder Review
# Replace with your self-hosted runner label
runs-on: [self-hosted, gpu]
timeout-minutes: 10
steps:
- name: Checkout
uses: actions/checkout@v4
with:
# Fetch enough history for git diff HEAD~1
fetch-depth: 2
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install requests
- name: Run AI review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
# Point at Ollama on the runner — it's running as a service
OLLAMA_URL: "http://localhost:11434"
REVIEW_MODEL: "deepseek-coder-v2:16b-lite-instruct-q4_K_M"
run: python scripts/review.py
If you don't have a self-hosted runner, replace runs-on with ubuntu-latest and add an Ollama install step at the top of the job:
- name: Start Ollama
run: |
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
sleep 5
ollama pull deepseek-coder-v2:7b-lite-instruct-q4_K_M
This adds ~3 min to each run and uses the 7B CPU model. Works, but slow.
Step 4: Add Secrets and Test
In your GitHub repo go to Settings → Secrets and variables → Actions and confirm GITHUB_TOKEN has write permission for pull requests.
In your repo's Settings → Actions → General, set workflow permissions to Read and write.
Open a test PR with a known bug:
# test_bug.py — commit this to trigger a review
def divide(a, b):
# Missing zero-division check — DeepSeek should flag this
return a / b
user_input = input("Enter divisor: ")
print(divide(10, user_input)) # Also missing int() cast
Push it, open a PR, and watch the Actions tab.
Verification
After the workflow runs, check the PR for a comment from the Actions bot. You should see something like:
## 🤖 DeepSeek Coder V2 Review
**Summary:** Two bugs in divide() will cause runtime crashes on invalid input.
### 🔴 Critical
- **`test_bug.py` line 3**: Division by zero not handled when b=0
> Fix: Add `if b == 0: raise ValueError("divisor cannot be zero")`
- **`test_bug.py` line 6**: user_input is a string, not int — TypeError at runtime
> Fix: Cast with `int(user_input)` and wrap in try/except ValueError
**Verdict:** ❌ Changes requested
The exit code 1 from the script will also mark the CI check as failed, blocking the merge until the issues are fixed.
Tuning the Prompt for Your Stack
The default prompt works for general code. Specialize it per language or framework by changing SYSTEM_PROMPT:
# For a Python/FastAPI repo
SYSTEM_PROMPT = """You are reviewing a Python FastAPI codebase.
Pay special attention to:
- Pydantic model validation gaps
- Missing `async` on database calls
- Unhandled HTTPException propagation
- SQL injection in raw queries
Respond ONLY with the JSON format described..."""
Commit different prompts per language by checking diff for file extensions before calling review_diff().
Production Considerations
Rate limiting: On busy repos, queue reviews with a Redis job queue rather than running one Ollama instance concurrently. Parallel calls to the same Ollama instance serialize anyway.
Cost: Self-hosted = electricity only. At 2 min/review on a 16GB GPU (A4000, ~$0.20/hr on Lambda Labs), a team doing 50 PRs/day costs ~$3.30/day.
False positive rate: Expect ~15% noise on info-severity items. Critical flags are reliable. Teach your team to treat the bot like a junior reviewer: worth reading, not worth blocking on every comment.
Context size: For PRs touching 20+ files, chunk the diff by file and make one Ollama call per file. The 128K context window handles most PRs, but very large refactors need splitting.
What You Learned
- DeepSeek Coder V2 16B runs on a single 16GB GPU via Ollama and is accurate enough for CI review automation
- Structured JSON output + temperature 0 gives consistent, parseable reviews
- Exit code 1 on critical issues blocks merges without any extra GitHub branch protection config
- The script is ~80 lines and has no heavy dependencies — easy to adapt to GitLab CI or Bitbucket Pipelines
Limitation: The model has no access to your broader codebase — only the diff. It won't catch issues that require understanding 10 files of context. For that, look at embedding your codebase into a RAG pipeline and injecting relevant context alongside the diff.
Tested on DeepSeek Coder V2 16B Q4_K_M via Ollama 0.5.4, Python 3.12, GitHub Actions, RTX A4000 16GB, Ubuntu 24.04