Automated AI code review in GitHub pull requests
Table of Contents
Manual code review is a bottleneck — especially on small teams where the same person writes and reviews code. This guide walks you through building a GitHub Actions workflow that sends pull request diffs to Claude, gets back structured line-level findings as JSON, and posts them as inline review comments directly on the PR. No third-party services, no subscriptions — just the GitHub API and Claude API wired together.
How the full pipeline works
The flow has four stages:
- A
pull_requestevent triggers the workflow - The diff is fetched using the GitHub Pulls API
- The diff is sent to Claude with a structured prompt — Claude returns JSON with file, line, severity, and message
- The workflow posts inline comments using the GitHub Reviews API
Before writing any code, make sure you have an ANTHROPIC_API_KEY from Anthropic’s Console and that it’s stored as a GitHub Actions secret. If you’re new to the Claude API, the getting started guide for Claude API in Python covers authentication and basic usage well.
Step 1: Set up the GitHub Actions workflow YAML
Create .github/workflows/ai-review.yml in your repo:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
permissions:
pull-requests: write
contents: read
jobs:
review:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Install dependencies
run: npm install @anthropic-ai/sdk
- name: Run AI review
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
REPO: ${{ github.repository }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
run: node .github/scripts/review.js
Key points:
permissions: pull-requests: writeis required for posting review commentsfetch-depth: 0ensures git history is available if you needgit difffallbackGITHUB_TOKENis automatically provided by Actions — no extra setup needed
To avoid duplicate comments on re-runs (when a developer pushes a fixup commit), the script will check for a ai-reviewed label on the PR before posting a fresh review. If the label exists, it skips posting and exits cleanly.
Step 2: Write the Node.js review script
Create .github/scripts/review.js:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const REPO = process.env.REPO;
const PR_NUMBER = process.env.PR_NUMBER;
const HEAD_SHA = process.env.HEAD_SHA;
const GITHUB_TOKEN = process.env.GITHUB_TOKEN;
const API = `https://api.github.com/repos/${REPO}`;
async function githubFetch(path, options = {}) {
const res = await fetch(`${API}${path}`, {
...options,
headers: {
Authorization: `Bearer ${GITHUB_TOKEN}`,
Accept: "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28",
"Content-Type": "application/json",
...(options.headers || {}),
},
});
return res.json();
}
async function hasReviewLabel() {
const labels = await githubFetch(`/issues/${PR_NUMBER}/labels`);
return labels.some((l) => l.name === "ai-reviewed");
}
async function addLabel() {
await githubFetch(`/issues/${PR_NUMBER}/labels`, {
method: "POST",
body: JSON.stringify({ labels: ["ai-reviewed"] }),
});
}
async function getFiles() {
return githubFetch(`/pulls/${PR_NUMBER}/files`);
}
function chunkFiles(files, maxCharsPerChunk = 12000) {
const chunks = [];
let current = [];
let size = 0;
for (const file of files) {
const patchSize = (file.patch || "").length;
if (size + patchSize > maxCharsPerChunk && current.length > 0) {
chunks.push(current);
current = [];
size = 0;
}
current.push(file);
size += patchSize;
}
if (current.length > 0) chunks.push(current);
return chunks;
}
function buildPrompt(files) {
const diffText = files
.filter((f) => f.patch)
.map((f) => `### File: ${f.filename}\n\`\`\`diff\n${f.patch}\n\`\`\``)
.join("\n\n");
return `You are a senior software engineer doing a code review. Analyze the following diff and return a JSON array of findings.
Each finding must have:
- "path": file path (string)
- "line": the line number in the NEW file where the issue appears (integer)
- "severity": one of "error", "warning", "info"
- "category": one of "security", "performance", "readability", "correctness"
- "comment": a concise, actionable review comment (string, max 300 chars)
Rules:
- Only flag real issues — no nitpicks about style preferences
- Skip files that look auto-generated (e.g., lock files, minified JS)
- Prioritize: security > correctness > performance > readability
- Return ONLY the JSON array, no markdown fences, no explanation
${diffText}`;
}
async function reviewChunk(files) {
const message = await client.messages.create({
model: "claude-opus-4-5",
max_tokens: 2048,
messages: [{ role: "user", content: buildPrompt(files) }],
});
const text = message.content[0].text.trim();
try {
return JSON.parse(text);
} catch {
console.error("Failed to parse Claude response:", text);
return [];
}
}
async function postReview(comments) {
// Filter out low-severity info items to reduce noise
const filtered = comments.filter((c) => c.severity !== "info");
if (filtered.length === 0) {
console.log("No significant findings. Skipping review post.");
return;
}
const body = filtered.map((c) => ({
path: c.path,
line: c.line,
side: "RIGHT",
body: `**[${c.severity.toUpperCase()}]** \`${c.category}\`: ${c.comment}`,
}));
await githubFetch(`/pulls/${PR_NUMBER}/reviews`, {
method: "POST",
body: JSON.stringify({
commit_id: HEAD_SHA,
event: "COMMENT",
body: `🤖 AI Code Review — ${filtered.length} finding(s) found.`,
comments: body,
}),
});
console.log(`Posted ${filtered.length} review comments.`);
}
async function main() {
if (await hasReviewLabel()) {
console.log("PR already reviewed. Skipping.");
process.exit(0);
}
const files = await getFiles();
const relevantFiles = files.filter(
(f) => f.patch && !f.filename.match(/(lock|min\.js|\.map)$/)
);
const chunks = chunkFiles(relevantFiles);
const allFindings = [];
for (const chunk of chunks) {
const findings = await reviewChunk(chunk);
allFindings.push(...findings);
}
await postReview(allFindings);
await addLabel();
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
The chunking logic at chunkFiles() splits files into batches under 12,000 characters each so you don’t hit context limits on large PRs. This is the same pattern used when building AI agents that process large inputs in steps.
Step 3: Tune the prompt for your team’s priorities
The prompt in buildPrompt() is where you control what Claude focuses on. Adjust the rules section based on what matters most to your team:
For a security-focused team:
// Add to the rules section in buildPrompt():
`- Flag any use of eval(), exec(), or unsanitized user input immediately
- Check for missing authentication on route handlers
- Flag hardcoded credentials or tokens`;
For a Laravel/PHP backend:
`- Flag missing Eloquent mass-assignment protection ($fillable/$guarded)
- Warn on raw DB::statement() calls with user input
- Check for N+1 query patterns in loops`;
For a JavaScript/TypeScript frontend:
`- Flag missing error boundaries around async operations
- Warn on direct DOM manipulation bypassing React's virtual DOM
- Check for missing dependency arrays in useEffect hooks`;
The structured JSON output approach here is similar to what I covered in getting reliable structured JSON output from LLMs — the same principles apply: be explicit about the schema, and tell the model to return only JSON.
For deeper prompt engineering strategies, prompt engineering fundamentals every developer should know is worth reading before customizing your review prompt.
Step 4: Secrets management and security considerations
Store your API key properly:
# Add to GitHub repo secrets via CLI
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..."
Never hardcode the key. In the workflow, always pass it via env: from ${{ secrets.* }} — not as a CLI argument, which would expose it in process listings.
A few additional guardrails worth adding:
- Rate limiting: Add a small delay (
await new Promise(r => setTimeout(r, 500))) between chunk API calls if you’re processing very large PRs - PR size gate: Skip review entirely if the diff exceeds 500 changed lines — very large refactors aren’t well-suited for line-level AI review
- Permissions scope: The workflow only needs
pull-requests: writeandcontents: read— don’t grantadminorreposcope
# Add this check at the start of the review job
- name: Check PR size
id: size_check
run: |
ADDITIONS=${{ github.event.pull_request.additions }}
if [ "$ADDITIONS" -gt 500 ]; then
echo "skip=true" >> $GITHUB_OUTPUT
fi
- name: Run AI review
if: steps.size_check.outputs.skip != 'true'
# ... rest of step
If you’re interested in how GitHub Actions integrates with other automation patterns, how I automated my blog with Claude API and GitHub Actions shows a complementary real-world setup.
Key takeaways
- The full pipeline — trigger → diff extraction → Claude API → inline comments — runs inside a single GitHub Actions job with no external dependencies beyond
@anthropic-ai/sdk - Chunking diffs by file keeps you within context limits and makes it easy to attribute findings to the right file
- The
ai-reviewedlabel prevents duplicate comment floods on re-runs — always check for it before posting a new review - Tune the prompt’s rules section to match your team’s priorities; a generic prompt will surface generic findings
- Filter out
info-severity findings before posting — the goal is signal, not noise