Automating DevOps tasks with Claude Code Routines

Table of Contents

Recurring DevOps work — dependency updates, stale test suites, incremental refactoring — eats hours every sprint without adding direct business value. Claude Code Routines, launched in May 2026, give you a structured way to hand that work to an AI agent that runs on a schedule, reacts to GitHub events, or fires on an API call. This guide walks you through setting up Routines from scratch, with real examples you can drop into your own projects.

What Claude Code Routines actually are

Routines are named, reusable instruction sets attached to a Claude Code agent. Think of them as cron jobs with reasoning built in. You define what the agent should do, when it should run, and what tools it can use — then Claude handles the execution and reports back.

They ship as part of the broader Claude Managed Agents orchestration layer, which means a Routine can itself spawn sub-agents, call external APIs, or hand off to a specialised agent mid-task. If you have already read the guide to building your first AI agent with Python and Claude, the mental model transfers directly — Routines are just agents with a declarative trigger layer on top.

A Routine has four parts:

  • Name — a slug you reference in the API or CLI
  • Instructions — the prompt that defines the task
  • Tools — bash, file read/write, GitHub API, etc.
  • Triggerschedule, webhook, or api

Setting up your first Routine: auto-updating package.json

The most common ask from teams adopting Routines is automated dependency updates without the noise of Dependabot PRs piling up. Here is a minimal routine.yaml that runs every Monday at 03:00 UTC, checks for outdated npm packages, updates them, runs the test suite, and opens a PR only if tests pass.

# routines/dependency-update.yaml
name: weekly-npm-update
trigger:
  type: schedule
  cron: "0 3 * * 1"
model: claude-opus-4-5
tools:
  - bash
  - github
instructions: |
  You are a dependency maintenance agent. Every run, do the following:
  1. Run `npm outdated --json` and parse the output.
  2. For each outdated package, run `npm update <package>` one at a time.
  3. After each update, run `npm test`. If tests fail, revert that package
     with `git checkout package-lock.json package.json` and move on.
  4. If at least one package updated successfully, create a branch named
     `chore/npm-update-<YYYY-MM-DD>`, commit the changes, and open a pull
     request with a summary table of updated packages.
  5. Post a Slack message to #engineering with the PR link.
max_turns: 40
on_failure: notify_slack

Register it with the CLI:

claude routines deploy --file routines/dependency-update.yaml --project my-app

The agent reasons about failures package-by-package rather than rolling back everything — something a plain shell script cannot do cheaply.

Triggering Routines from GitHub events via webhooks

Schedules cover recurring work, but some tasks need to fire in response to something. Claude Code Routines support a webhook trigger that accepts a GitHub Actions payload directly.

Below is a Routine that generates missing unit tests whenever a pull request is opened against main:

# routines/test-generator.yaml
name: pr-test-generation
trigger:
  type: webhook
  source: github
  event: pull_request.opened
  filter: "base.ref == 'main'"
model: claude-sonnet-4-5
tools:
  - bash
  - github
  - file_write
instructions: |
  A pull request has been opened. Do the following:
  1. Fetch the diff using the GitHub API for the PR in the event payload.
  2. Identify new or modified functions that have no corresponding test file.
  3. For each such function, generate a Jest test file in the __tests__
     directory that covers happy path, edge cases, and error states.
  4. Commit the generated test files to the PR branch.
  5. Leave a PR comment listing which functions were covered and which
     were skipped (e.g., UI-only components).
max_turns: 60

Wire this into GitHub Actions so the webhook fires on PR events:

# .github/workflows/claude-test-gen.yml
name: Claude Test Generation

on:
  pull_request:
    branches: [main]

jobs:
  trigger-routine:
    runs-on: ubuntu-latest
    steps:
      - name: Fire Claude Routine webhook
        run: |
          curl -X POST https://api.anthropic.com/v1/routines/pr-test-generation/trigger \
            -H "x-api-key: ${{ secrets.ANTHROPIC_API_KEY }}" \
            -H "content-type: application/json" \
            -d '${{ toJson(github.event) }}'

This pairs well with automated AI code review in GitHub pull requests — run code review and test generation as parallel Routines on the same PR event.

Running refactoring tasks on a cron with Managed Agents

Longer-horizon tasks — like gradually migrating a codebase from callbacks to async/await — benefit from Routines that spawn child agents through the Managed Agents API. A parent Routine can split the repo into modules, delegate each module to a sub-agent, collect results, and merge clean diffs.

# scripts/spawn_refactor_agents.py
import anthropic
import json

client = anthropic.Anthropic()

def trigger_routine_with_context(routine_name: str, context: dict):
    response = client.routines.trigger(
        name=routine_name,
        input=context,
    )
    return response.execution_id

# Identify modules from your project manifest
modules = json.load(open("modules.json"))

for module in modules:
    exec_id = trigger_routine_with_context(
        "async-migration-agent",
        {"module_path": module["path"], "file_patterns": module["patterns"]},
    )
    print(f"Spawned agent for {module['path']} → execution {exec_id}")

The parent Routine’s instructions tell it to wait for all child executions to complete, then run a final integration test pass before raising a consolidated PR. You can monitor execution state through the same API:

claude routines status --execution-id <exec_id>

For teams already using MCP servers for local development workflows, you can expose internal tools — database introspection, internal APIs — to Routines via the same MCP protocol, so the agent has the same tool surface locally and in CI.

Integrating Routines into an existing CI/CD pipeline

Most teams will not replace their CI/CD pipeline with Routines — they will add Routines as a layer that sits alongside it. A practical pattern is to call Routines from a post-deploy hook to run smoke-test generation or dependency audits against the freshly deployed environment.

# .github/workflows/post-deploy.yml
name: Post-deploy Routines

on:
  workflow_run:
    workflows: ["Deploy to Production"]
    types: [completed]

jobs:
  run-routines:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    steps:
      - name: Trigger smoke-test Routine
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          DEPLOY_URL: ${{ secrets.PRODUCTION_URL }}
        run: |
          curl -X POST https://api.anthropic.com/v1/routines/post-deploy-smoke/trigger \
            -H "x-api-key: $ANTHROPIC_API_KEY" \
            -H "content-type: application/json" \
            -d "{\"environment\": \"production\", \"base_url\": \"$DEPLOY_URL\"}"

      - name: Poll until complete
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude routines wait --name post-deploy-smoke --timeout 300

If you want background on the prompt patterns that make these instructions reliable, the prompt engineering fundamentals guide covers exactly what makes agent instructions robust versus fragile.

Key takeaways

  • Routines = declarative agent triggers. Define instructions once, attach a schedule or webhook trigger, and Claude handles reasoning through edge cases that a plain shell script would fail on silently.
  • Pair Routines with Managed Agents for complex tasks. Tasks like whole-codebase refactoring benefit from a parent Routine that orchestrates multiple specialised child agents rather than one monolithic prompt.
  • Keep max_turns honest. Set it to roughly 1.5× the turns you expect — too low and the agent bails early, too high and a runaway agent burns API budget. Start conservative, then tune based on execution logs.