Reliable structured JSON output from OpenAI in Python

Table of Contents

Getting raw text back from an LLM and then hoping a regex holds it together is a bad strategy. OpenAI’s structured outputs feature lets you enforce a strict JSON schema at the API level — the model is constrained to respond only in the shape you define. This guide shows you how to wire that up with Pydantic v2 in Python, handle real-world edge cases, and know when you need something more.

Setting up OpenAI structured outputs with Pydantic v2

Install the required packages first:

pip install openai pydantic

As of the openai Python SDK v1.30+ (current in 2026), structured outputs are supported via response_format with json_schema type, or through the cleaner client.beta.chat.completions.parse() method which handles schema generation automatically from your Pydantic model.

Here is a minimal example parsing a product listing into a typed object:

from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI()

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    category: str = Field(description="Product category, e.g. electronics, clothing")

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",  # structured outputs require this model or later
    messages=[
        {"role": "system", "content": "Extract product details from the text."},
        {"role": "user", "content": "Sony WH-1000XM5 headphones, $279.99, currently available, electronics"},
    ],
    response_format=Product,
)

product = completion.choices[0].message.parsed
print(product.name)    # Sony WH-1000XM5
print(product.price)   # 279.99
print(product.in_stock) # True

The .parsed attribute gives you a fully validated Pydantic model instance — no json.loads, no .model_validate() call needed. The SDK generates the JSON schema from your model, sends it to the API, and deserializes the response back into the model automatically.

Handling nested objects, enums, and optional fields

Real extraction tasks are rarely flat. Here is a more complete schema for parsing an invoice:

from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field

class Currency(str, Enum):
    USD = "USD"
    EUR = "EUR"
    GBP = "GBP"
    INR = "INR"

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    total: float

class Invoice(BaseModel):
    invoice_number: str
    vendor_name: str
    issue_date: str = Field(description="ISO 8601 date string, e.g. 2026-05-16")
    due_date: Optional[str] = None
    currency: Currency
    line_items: list[LineItem]
    subtotal: float
    tax_rate: Optional[float] = Field(default=None, description="Tax percentage, e.g. 18.0 for 18%")
    total_amount: float
    notes: Optional[str] = None

A few things worth noting here:

  • Optional[str] = None fields are handled correctly — the model will emit null when information is absent rather than hallucinating a value
  • Enum fields constrain the output to exactly the values you list; the model will not invent "DOLLARS" or "dollars"
  • Nested list[LineItem] works without any extra configuration — the SDK recursively generates the schema

The one hard limitation: OpenAI structured outputs do not support Union types with arbitrary members. For discriminated unions, you need to use Annotated with a discriminator field:

from typing import Literal, Union, Annotated
from pydantic import BaseModel, Field

class PhysicalItem(BaseModel):
    type: Literal["physical"]
    weight_kg: float
    shipping_required: bool

class DigitalItem(BaseModel):
    type: Literal["digital"]
    download_url: str
    license_key: Optional[str] = None

class OrderItem(BaseModel):
    name: str
    price: float
    item: Annotated[Union[PhysicalItem, DigitalItem], Field(discriminator="type")]

This works because the discriminator gives the model an unambiguous way to pick the branch.

A real extraction use case: parsing resumes

Here is a practical end-to-end example that extracts structured data from a resume text blob — the kind of thing you might run on scraped PDFs converted to text:

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional

client = OpenAI()

class WorkExperience(BaseModel):
    company: str
    title: str
    start_date: str
    end_date: Optional[str] = Field(default=None, description="Null if current role")
    responsibilities: list[str]

class Resume(BaseModel):
    full_name: str
    email: str
    phone: Optional[str] = None
    location: Optional[str] = None
    skills: list[str]
    work_experience: list[WorkExperience]
    education: list[str]
    years_of_experience: float = Field(description="Total years, calculated from work history")

def parse_resume(raw_text: str) -> Resume:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a resume parser. Extract all available information. "
                    "For missing fields, use null. Do not infer or hallucinate data."
                ),
            },
            {"role": "user", "content": raw_text},
        ],
        response_format=Resume,
    )
    return completion.choices[0].message.parsed

resume = parse_resume(open("candidate.txt").read())
print(f"{resume.full_name}{resume.years_of_experience} years experience")
for job in resume.work_experience:
    print(f"  {job.title} at {job.company} ({job.start_date}{job.end_date or 'present'})")

If you are building a full pipeline on top of this, the LLM-powered REST API with FastAPI guide covers wrapping extraction logic like this into a proper API endpoint.

Handling validation errors, partial outputs, and refusals

The API can return a refusal instead of content when the model decides it cannot comply with the request. Always check for this:

choice = completion.choices[0].message

if choice.refusal:
    print(f"Model refused: {choice.refusal}")
elif choice.parsed:
    process(choice.parsed)
else:
    # Incomplete generation — finish_reason will be "length"
    print(f"Incomplete output. Finish reason: {completion.choices[0].finish_reason}")

For partial outputs due to token limits, increase max_tokens or chunk your input. A truncated JSON schema response will be caught by the SDK before it reaches .parsed, so you will get an exception rather than a half-formed object.

When you need retry-on-validation-error logic — for example, when your post-processing applies additional business rules the schema cannot express — wrap the call:

import time
from pydantic import ValidationError

def parse_with_retry(raw_text: str, max_retries: int = 3) -> Resume:
    for attempt in range(max_retries):
        try:
            result = parse_resume(raw_text)
            # additional business validation
            assert result.years_of_experience >= 0
            return result
        except (ValidationError, AssertionError) as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)
    raise RuntimeError("Exhausted retries")

For more complex multi-step extraction — where a first pass extracts raw entities and a second pass normalises or classifies them — the patterns described in the RAG pipeline from scratch guide apply: keep each step focused and validate at boundaries.

Structured outputs vs. function calling vs. the instructor library

Older function calling approach: Before structured outputs, the pattern was to define a function with a JSON schema in the tools parameter and parse tool_calls[0].function.arguments. It worked but required manual json.loads and model_validate, and the model was not strictly constrained — it could still produce invalid JSON in edge cases.

instructor library: instructor wraps the OpenAI client and adds automatic retry-on-validation-error, partial streaming support, and multi-provider compatibility. If you need those features out of the box, it is worth the dependency. The trade-off is an extra abstraction layer.

Native structured outputs: The right default for greenfield Python projects targeting GPT-4o or later. Zero extra dependencies, guaranteed schema conformance at the API level, and .parsed gives you the Pydantic object directly.

When structured outputs are not enough:

  • Your schema needs to express cross-field constraints (invoice.subtotal + tax == invoice.total)
  • You are post-processing the output with business rules that cannot be encoded in JSON Schema
  • You need streaming partial objects (use instructor or stream + parse manually)
  • You are hitting a model that does not support structured outputs (older GPT-3.5 endpoints, local models via Ollama — see how to run Llama 3 locally with Ollama for that setup)

In those cases, layer on top: use structured outputs for the initial extraction, then validate with Pydantic validators or a second LLM call. The two-pass pattern — extract first, verify or enrich second — is the most reliable approach for production pipelines.

If you are interested in how function calling works on the Anthropic side, the Claude tool use practical guide covers the equivalent pattern with Claude’s tool definitions.


Key takeaways:

  • Use client.beta.chat.completions.parse() with a Pydantic v2 model — it handles schema generation and deserialization automatically, and requires GPT-4o (August 2024) or later
  • Always check choice.refusal and finish_reason before accessing .parsed — truncated outputs and refusals are silent failures if you skip this
  • For constraints that JSON Schema cannot express, structured outputs get you 80% of the way; add a validation layer or retry loop for the rest