Automated AI code review in GitHub pull requests

Table of Contents

Manual code review is a bottleneck — especially on small teams where the same person writes and reviews code. This guide walks you through building a GitHub Actions workflow that sends pull request diffs to Claude, gets back structured line-level findings as JSON, and posts them as inline review comments directly on the PR. No third-party services, no subscriptions — just the GitHub API and Claude API wired together.

How the full pipeline works

The flow has four stages:

A pull_request event triggers the workflow
The diff is fetched using the GitHub Pulls API
The diff is sent to Claude with a structured prompt — Claude returns JSON with file, line, severity, and message
The workflow posts inline comments using the GitHub Reviews API

Before writing any code, make sure you have an ANTHROPIC_API_KEY from Anthropic’s Console and that it’s stored as a GitHub Actions secret. If you’re new to the Claude API, the getting started guide for Claude API in Python covers authentication and basic usage well.

Step 1: Set up the GitHub Actions workflow YAML

Create .github/workflows/ai-review.yml in your repo:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  pull-requests: write
  contents: read

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "20"

      - name: Install dependencies
        run: npm install @anthropic-ai/sdk

      - name: Run AI review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
          HEAD_SHA: ${{ github.event.pull_request.head.sha }}
        run: node .github/scripts/review.js

Key points:

permissions: pull-requests: write is required for posting review comments
fetch-depth: 0 ensures git history is available if you need git diff fallback
GITHUB_TOKEN is automatically provided by Actions — no extra setup needed

To avoid duplicate comments on re-runs (when a developer pushes a fixup commit), the script will check for a ai-reviewed label on the PR before posting a fresh review. If the label exists, it skips posting and exits cleanly.

Step 2: Write the Node.js review script

Create .github/scripts/review.js:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const REPO = process.env.REPO;
const PR_NUMBER = process.env.PR_NUMBER;
const HEAD_SHA = process.env.HEAD_SHA;
const GITHUB_TOKEN = process.env.GITHUB_TOKEN;
const API = `https://api.github.com/repos/${REPO}`;

async function githubFetch(path, options = {}) {
  const res = await fetch(`${API}${path}`, {
    ...options,
    headers: {
      Authorization: `Bearer ${GITHUB_TOKEN}`,
      Accept: "application/vnd.github+json",
      "X-GitHub-Api-Version": "2022-11-28",
      "Content-Type": "application/json",
      ...(options.headers || {}),
    },
  });
  return res.json();
}

async function hasReviewLabel() {
  const labels = await githubFetch(`/issues/${PR_NUMBER}/labels`);
  return labels.some((l) => l.name === "ai-reviewed");
}

async function addLabel() {
  await githubFetch(`/issues/${PR_NUMBER}/labels`, {
    method: "POST",
    body: JSON.stringify({ labels: ["ai-reviewed"] }),
  });
}

async function getFiles() {
  return githubFetch(`/pulls/${PR_NUMBER}/files`);
}

function chunkFiles(files, maxCharsPerChunk = 12000) {
  const chunks = [];
  let current = [];
  let size = 0;

  for (const file of files) {
    const patchSize = (file.patch || "").length;
    if (size + patchSize > maxCharsPerChunk && current.length > 0) {
      chunks.push(current);
      current = [];
      size = 0;
    }
    current.push(file);
    size += patchSize;
  }
  if (current.length > 0) chunks.push(current);
  return chunks;
}

function buildPrompt(files) {
  const diffText = files
    .filter((f) => f.patch)
    .map((f) => `### File: ${f.filename}\n\`\`\`diff\n${f.patch}\n\`\`\``)
    .join("\n\n");

  return `You are a senior software engineer doing a code review. Analyze the following diff and return a JSON array of findings.

Each finding must have:
- "path": file path (string)
- "line": the line number in the NEW file where the issue appears (integer)
- "severity": one of "error", "warning", "info"
- "category": one of "security", "performance", "readability", "correctness"
- "comment": a concise, actionable review comment (string, max 300 chars)

Rules:
- Only flag real issues — no nitpicks about style preferences
- Skip files that look auto-generated (e.g., lock files, minified JS)
- Prioritize: security > correctness > performance > readability
- Return ONLY the JSON array, no markdown fences, no explanation

${diffText}`;
}

async function reviewChunk(files) {
  const message = await client.messages.create({
    model: "claude-opus-4-5",
    max_tokens: 2048,
    messages: [{ role: "user", content: buildPrompt(files) }],
  });

  const text = message.content[0].text.trim();
  try {
    return JSON.parse(text);
  } catch {
    console.error("Failed to parse Claude response:", text);
    return [];
  }
}

async function postReview(comments) {
  // Filter out low-severity info items to reduce noise
  const filtered = comments.filter((c) => c.severity !== "info");

  if (filtered.length === 0) {
    console.log("No significant findings. Skipping review post.");
    return;
  }

  const body = filtered.map((c) => ({
    path: c.path,
    line: c.line,
    side: "RIGHT",
    body: `**[${c.severity.toUpperCase()}]** \`${c.category}\`: ${c.comment}`,
  }));

  await githubFetch(`/pulls/${PR_NUMBER}/reviews`, {
    method: "POST",
    body: JSON.stringify({
      commit_id: HEAD_SHA,
      event: "COMMENT",
      body: `🤖 AI Code Review — ${filtered.length} finding(s) found.`,
      comments: body,
    }),
  });

  console.log(`Posted ${filtered.length} review comments.`);
}

async function main() {
  if (await hasReviewLabel()) {
    console.log("PR already reviewed. Skipping.");
    process.exit(0);
  }

  const files = await getFiles();
  const relevantFiles = files.filter(
    (f) => f.patch && !f.filename.match(/(lock|min\.js|\.map)$/)
  );

  const chunks = chunkFiles(relevantFiles);
  const allFindings = [];

  for (const chunk of chunks) {
    const findings = await reviewChunk(chunk);
    allFindings.push(...findings);
  }

  await postReview(allFindings);
  await addLabel();
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

The chunking logic at chunkFiles() splits files into batches under 12,000 characters each so you don’t hit context limits on large PRs. This is the same pattern used when building AI agents that process large inputs in steps.

Step 3: Tune the prompt for your team’s priorities

The prompt in buildPrompt() is where you control what Claude focuses on. Adjust the rules section based on what matters most to your team:

For a security-focused team:

// Add to the rules section in buildPrompt():
`- Flag any use of eval(), exec(), or unsanitized user input immediately
- Check for missing authentication on route handlers
- Flag hardcoded credentials or tokens`;

For a Laravel/PHP backend:

`- Flag missing Eloquent mass-assignment protection ($fillable/$guarded)
- Warn on raw DB::statement() calls with user input
- Check for N+1 query patterns in loops`;

For a JavaScript/TypeScript frontend:

`- Flag missing error boundaries around async operations
- Warn on direct DOM manipulation bypassing React's virtual DOM
- Check for missing dependency arrays in useEffect hooks`;

The structured JSON output approach here is similar to what I covered in getting reliable structured JSON output from LLMs — the same principles apply: be explicit about the schema, and tell the model to return only JSON.

For deeper prompt engineering strategies, prompt engineering fundamentals every developer should know is worth reading before customizing your review prompt.

Step 4: Secrets management and security considerations

Store your API key properly:

# Add to GitHub repo secrets via CLI
gh secret set ANTHROPIC_API_KEY --body "sk-ant-..."

Never hardcode the key. In the workflow, always pass it via env: from ${{ secrets.* }} — not as a CLI argument, which would expose it in process listings.

A few additional guardrails worth adding:

Rate limiting: Add a small delay (await new Promise(r => setTimeout(r, 500))) between chunk API calls if you’re processing very large PRs
PR size gate: Skip review entirely if the diff exceeds 500 changed lines — very large refactors aren’t well-suited for line-level AI review
Permissions scope: The workflow only needs pull-requests: write and contents: read — don’t grant admin or repo scope

# Add this check at the start of the review job
- name: Check PR size
  id: size_check
  run: |
    ADDITIONS=${{ github.event.pull_request.additions }}
    if [ "$ADDITIONS" -gt 500 ]; then
      echo "skip=true" >> $GITHUB_OUTPUT
    fi

- name: Run AI review
  if: steps.size_check.outputs.skip != 'true'
  # ... rest of step

If you’re interested in how GitHub Actions integrates with other automation patterns, how I automated my blog with Claude API and GitHub Actions shows a complementary real-world setup.

Key takeaways

The full pipeline — trigger → diff extraction → Claude API → inline comments — runs inside a single GitHub Actions job with no external dependencies beyond @anthropic-ai/sdk
Chunking diffs by file keeps you within context limits and makes it easy to attribute findings to the right file
The ai-reviewed label prevents duplicate comment floods on re-runs — always check for it before posting a new review
Tune the prompt’s rules section to match your team’s priorities; a generic prompt will surface generic findings
Filter out info-severity findings before posting — the goal is signal, not noise