GitHub Copilot usage-based billing: what agents cost now

Table of Contents

GitHub Copilot’s usage-based billing went live on June 1, 2026, retiring flat Premium Request Units in favour of AI Credits — token-metered billing where 1 credit = $0.01 USD. If you’re running Copilot coding agents in production, not just inline completions, this change hits differently. I’ve been running a Copilot agent that turns Jira tickets into pull requests on a large legacy PHP codebase — so I spent the first few weeks of June watching the meter instead of the output.

Here’s what I actually found.

What changed on June 1, 2026

The old model was legible. You got a monthly bucket of Premium Request Units (PRUs) and each action — one chat message, one agent invocation — drew one PRU regardless of how much work it actually did. Predictable, but economically dishonest: a 200-token chat reply and a 20-file autonomous agent run cost the same.

Starting June 1, every Copilot plan now includes a monthly allotment of GitHub AI Credits, with usage calculated based on token consumption — including input, output, and cached tokens — using the listed API rates for each model.

Each token is priced based on the model used, and the total is converted into AI credits, where 1 AI credit = $0.01 USD. The cost of an interaction depends on two things: the model and the number of tokens consumed.

Plan prices didn’t move, but the included value shifted:

Pro is $10/month (includes $15 in credits), Pro+ is $39/month (includes $70), and Max is $100/month (includes $200).

Code completions and Next Edit Suggestions remain included in all plans and do not consume AI Credits. So if autocomplete is most of your usage, your bill barely changes. If you run agents, you need to do the math now.

One thing that quietly stings: the safety-net fallback to cheaper models when quota runs out is gone — agentic sessions now stop when credits run out.

What one agent run actually costs

I built a Copilot agent that reads Jira tickets and opens PRs on a legacy PHP monorepo. A single full run — read the ticket, investigate the relevant files, implement the fix, write the PR description — goes roughly like this when using a frontier model (GPT-4o or Claude Sonnet-class):

Phase	Rough token count	Notes
System prompt + AGENTS.md	~3,000 input	Loaded on every run
Ticket content + context files	~15,000–40,000 input	Depends on scope
Model reasoning + file edits	~2,000–6,000 output	The actual work
Tool calls (file reads, searches)	~8,000–20,000 input	Agents call tools in loops
Total per run	~28,000–70,000 tokens	Wide range based on diff size

At GPT-4o rates (roughly $2.50/M input, $10/M output as priced through Copilot), a mid-range run of ~40,000 input tokens + 4,000 output tokens comes out to:

Input:  40,000 tokens × $2.50 / 1,000,000 = $0.10  → 10 credits
Output:  4,000 tokens × $10.00 / 1,000,000 = $0.04  →  4 credits
Total: ~14 credits per run (~$0.14)

A contained, well-scoped ticket. Now scale that across a two-week sprint: 30 tickets, mixed sizes.

Conservative (small tickets):   30 × 10 credits = 300 credits ($3.00)
Realistic (mixed sprint):       30 × 20 credits = 600 credits ($6.00)
Heavy sprint (large diffs):     30 × 50 credits = 1,500 credits ($15.00)

A Pro plan’s $15 credit bundle covers a normal sprint by itself — barely. A Pro+ developer doing heavy agentic work through the month can comfortably stay inside $70. But the moment an agent goes into a long autonomous loop on a large diff, individual runs can spike to 50–100+ credits each.

A complex agentic session working across a large codebase will consume significantly more usage than a quick question in chat, because agentic features like agent mode and Copilot cloud agent involve multiple model calls within a single task.

Where the new model quietly stings

Three patterns that hurt more than they should:

1. Chatty agents with no iteration cap. An agent that loops — reads a file, generates a plan, reads more files, revises the plan, reads more files — burns input tokens on every iteration. I’ve seen a single “investigate this bug” run touch 12 tool calls before writing a line of code. That’s 12 sets of context being assembled and sent.

2. A fat AGENTS.md. Every agent run loads your AGENTS.md as part of the system context. I originally had mine at 4,200 tokens — detailed coding standards, full file tree, example patterns. Reducing context from 100,000 to 20,000 tokens cuts input costs by 80%. The principle scales down: I trimmed AGENTS.md to 900 tokens of true must-know rules and the per-run cost dropped noticeably.

3. Using a frontier model for triage work. Using GPT-4o or a Claude Sonnet-class model to read a ticket and decide it’s a one-line CSS fix is expensive overkill. The model spread inside Copilot is roughly 40x between the cheapest and most capable options. Route simple lookups and summaries through a lightweight model; reserve the frontier model for implementation and review.

The output-token spread is approximately 40x across the model menu. That’s not a rounding error — that’s the difference between a $3/month habit and a $120/month one.

Practical levers to control spend

These are the ones that actually moved the meter for me, not just in theory:

Scope the agent with tight do/don’t instructions. In my agent’s system prompt, I explicitly list directories it shouldn’t read unless the ticket references them. “Do not read vendor/, storage/, or any file outside the module named in the ticket” cuts exploratory tool calls by 60% on focused bugs.

Keep AGENTS.md ruthlessly lean. Your AGENTS.md (or equivalent context file) is loaded on every single run. Treat each line as costing money, because it does. Put only what the agent must know: naming conventions, the PR template, a few gotchas specific to your codebase. Link to extended docs in comments — don’t inline them.

Set a run length cap. Copilot’s agent mode has configurable iteration limits. I cap mine at 15 tool calls. If a fix needs more than 15 tool calls to investigate, the ticket is probably too vague to automate anyway — I’d rather it stop and surface a summary than burn 200 credits going in circles.

Use completions for the cheap stuff. Inline completions don’t consume AI Credits at all. For repetitive boilerplate, method stubs, and test scaffolding on well-typed code, completions handle 80% of the keystrokes at zero metered cost. Agent mode is for the genuinely hard, multi-file work.

Check the billing dashboard before the bill arrives. GitHub’s billing overview now shows per-model token consumption. Export usage data before the first bill arrives — GitHub’s billing dashboard shows per-model token consumption so you can catch a spike mid-month, not after it closes.

How it compares to calling Claude or OpenAI directly

Here’s the honest comparison for the same “read ticket → investigate → implement → write PR description” task:

Copilot (Pro+, frontier model): ~20 credits per mid-range run. At 30 runs/sprint that’s 600 credits = $6.00. The $70 included in Pro+ comfortably covers a solo developer doing this daily.

Claude API directly: Using Claude Sonnet via the API with prompt caching (which Anthropic supports and which is hugely effective on repeated system prompts), a similar run costs roughly $0.08–$0.15 all-in. If you’ve already built the tooling — which I did when I was building AI agents with Python and Claude — the per-run cost is comparable or slightly cheaper, with full control over model selection and context handling. The overhead is that you maintain the integration yourself.

OpenCode pointed at a cheaper model: I switched a chunk of my routine work to OpenCode after moving from Cursor, routing it at Gemini Flash or a locally-running Qwen model via Ollama. For code investigation and summarisation tasks, running a local model via Ollama costs nothing per token. Quality drops on complex multi-file rewrites, but for ticket triage and test generation the output is perfectly usable.

The honest calculus: Copilot Pro+ is a reasonable deal for a solo developer doing 20–40 agentic runs per month. The $70 credit bundle holds. It breaks down for team leads reviewing large diffs with every PR — that’s where automated AI code review in GitHub Actions hit differently under the new model, since each review pass now has a real per-token cost.

For teams with 5+ developers all running agent mode actively, model the actual token consumption from your billing dashboard before approving any budget. The community has already seen bills swing from $29 to $750 at the heavy end. Reports of costs jumping from $29 to $750 per month and from $50 to $3,000 are spreading across Reddit, X, and GitHub’s own discussion threads.

When usage-based billing is actually a win

It’s fairer for light users. Under the old PRU model, someone who ran 10 agent sessions a month subsidised someone who ran 300. Credits align cost with consumption.

Usage-based billing is a win if:

You’re a solo developer or a small team doing focused, well-scoped agentic work
Most of your Copilot usage is inline completions (which remain free)
You’re disciplined about model choice — lightweight for triage, frontier for implementation

It stings if:

You run long autonomous sessions against large, poorly-documented codebases
Your AGENTS.md or system prompts are verbose and loaded on every call
You’re at an organisation where multiple devs run agents all day without a spending cap

The important habit, regardless of where you land: instrument your runs. Know what a normal ticket costs in credits before you have 200 of them on a bill. GitHub’s billing dashboard gives you per-model breakdown — use it in the first week of the new cycle, not the last.

Key takeaways:

1 AI Credit = $0.01. A well-scoped agent run on a frontier model costs 10–50 credits. A runaway agentic loop on a large codebase can cost 10x that.
Inline completions and Next Edit Suggestions are still free. The meter only runs on chat, agent mode, and code review.
Trim context aggressively: lean AGENTS.md, scoped file access, and hard iteration caps are the three levers with the biggest return.
Pro+ at $70/month of included credits is a reasonable solo-developer budget for moderate agentic use. Model your team’s actual token consumption before any billing cycle closes.