Reliable structured JSON output from OpenAI in Python
Table of Contents
Getting raw text back from an LLM and then hoping a regex holds it together is a bad strategy. OpenAI’s structured outputs feature lets you enforce a strict JSON schema at the API level — the model is constrained to respond only in the shape you define. This guide shows you how to wire that up with Pydantic v2 in Python, handle real-world edge cases, and know when you need something more.
Setting up OpenAI structured outputs with Pydantic v2
Install the required packages first:
pip install openai pydantic
As of the openai Python SDK v1.30+ (current in 2026), structured outputs are supported via response_format with json_schema type, or through the cleaner client.beta.chat.completions.parse() method which handles schema generation automatically from your Pydantic model.
Here is a minimal example parsing a product listing into a typed object:
from openai import OpenAI
from pydantic import BaseModel, Field
client = OpenAI()
class Product(BaseModel):
name: str
price: float
in_stock: bool
category: str = Field(description="Product category, e.g. electronics, clothing")
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06", # structured outputs require this model or later
messages=[
{"role": "system", "content": "Extract product details from the text."},
{"role": "user", "content": "Sony WH-1000XM5 headphones, $279.99, currently available, electronics"},
],
response_format=Product,
)
product = completion.choices[0].message.parsed
print(product.name) # Sony WH-1000XM5
print(product.price) # 279.99
print(product.in_stock) # True
The .parsed attribute gives you a fully validated Pydantic model instance — no json.loads, no .model_validate() call needed. The SDK generates the JSON schema from your model, sends it to the API, and deserializes the response back into the model automatically.
Handling nested objects, enums, and optional fields
Real extraction tasks are rarely flat. Here is a more complete schema for parsing an invoice:
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class Currency(str, Enum):
USD = "USD"
EUR = "EUR"
GBP = "GBP"
INR = "INR"
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
total: float
class Invoice(BaseModel):
invoice_number: str
vendor_name: str
issue_date: str = Field(description="ISO 8601 date string, e.g. 2026-05-16")
due_date: Optional[str] = None
currency: Currency
line_items: list[LineItem]
subtotal: float
tax_rate: Optional[float] = Field(default=None, description="Tax percentage, e.g. 18.0 for 18%")
total_amount: float
notes: Optional[str] = None
A few things worth noting here:
Optional[str] = Nonefields are handled correctly — the model will emitnullwhen information is absent rather than hallucinating a valueEnumfields constrain the output to exactly the values you list; the model will not invent"DOLLARS"or"dollars"- Nested
list[LineItem]works without any extra configuration — the SDK recursively generates the schema
The one hard limitation: OpenAI structured outputs do not support Union types with arbitrary members. For discriminated unions, you need to use Annotated with a discriminator field:
from typing import Literal, Union, Annotated
from pydantic import BaseModel, Field
class PhysicalItem(BaseModel):
type: Literal["physical"]
weight_kg: float
shipping_required: bool
class DigitalItem(BaseModel):
type: Literal["digital"]
download_url: str
license_key: Optional[str] = None
class OrderItem(BaseModel):
name: str
price: float
item: Annotated[Union[PhysicalItem, DigitalItem], Field(discriminator="type")]
This works because the discriminator gives the model an unambiguous way to pick the branch.
A real extraction use case: parsing resumes
Here is a practical end-to-end example that extracts structured data from a resume text blob — the kind of thing you might run on scraped PDFs converted to text:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional
client = OpenAI()
class WorkExperience(BaseModel):
company: str
title: str
start_date: str
end_date: Optional[str] = Field(default=None, description="Null if current role")
responsibilities: list[str]
class Resume(BaseModel):
full_name: str
email: str
phone: Optional[str] = None
location: Optional[str] = None
skills: list[str]
work_experience: list[WorkExperience]
education: list[str]
years_of_experience: float = Field(description="Total years, calculated from work history")
def parse_resume(raw_text: str) -> Resume:
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "system",
"content": (
"You are a resume parser. Extract all available information. "
"For missing fields, use null. Do not infer or hallucinate data."
),
},
{"role": "user", "content": raw_text},
],
response_format=Resume,
)
return completion.choices[0].message.parsed
resume = parse_resume(open("candidate.txt").read())
print(f"{resume.full_name} — {resume.years_of_experience} years experience")
for job in resume.work_experience:
print(f" {job.title} at {job.company} ({job.start_date} – {job.end_date or 'present'})")
If you are building a full pipeline on top of this, the LLM-powered REST API with FastAPI guide covers wrapping extraction logic like this into a proper API endpoint.
Handling validation errors, partial outputs, and refusals
The API can return a refusal instead of content when the model decides it cannot comply with the request. Always check for this:
choice = completion.choices[0].message
if choice.refusal:
print(f"Model refused: {choice.refusal}")
elif choice.parsed:
process(choice.parsed)
else:
# Incomplete generation — finish_reason will be "length"
print(f"Incomplete output. Finish reason: {completion.choices[0].finish_reason}")
For partial outputs due to token limits, increase max_tokens or chunk your input. A truncated JSON schema response will be caught by the SDK before it reaches .parsed, so you will get an exception rather than a half-formed object.
When you need retry-on-validation-error logic — for example, when your post-processing applies additional business rules the schema cannot express — wrap the call:
import time
from pydantic import ValidationError
def parse_with_retry(raw_text: str, max_retries: int = 3) -> Resume:
for attempt in range(max_retries):
try:
result = parse_resume(raw_text)
# additional business validation
assert result.years_of_experience >= 0
return result
except (ValidationError, AssertionError) as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
raise RuntimeError("Exhausted retries")
For more complex multi-step extraction — where a first pass extracts raw entities and a second pass normalises or classifies them — the patterns described in the RAG pipeline from scratch guide apply: keep each step focused and validate at boundaries.
Structured outputs vs. function calling vs. the instructor library
Older function calling approach: Before structured outputs, the pattern was to define a function with a JSON schema in the tools parameter and parse tool_calls[0].function.arguments. It worked but required manual json.loads and model_validate, and the model was not strictly constrained — it could still produce invalid JSON in edge cases.
instructor library: instructor wraps the OpenAI client and adds automatic retry-on-validation-error, partial streaming support, and multi-provider compatibility. If you need those features out of the box, it is worth the dependency. The trade-off is an extra abstraction layer.
Native structured outputs: The right default for greenfield Python projects targeting GPT-4o or later. Zero extra dependencies, guaranteed schema conformance at the API level, and .parsed gives you the Pydantic object directly.
When structured outputs are not enough:
- Your schema needs to express cross-field constraints (
invoice.subtotal + tax == invoice.total) - You are post-processing the output with business rules that cannot be encoded in JSON Schema
- You need streaming partial objects (use
instructoror stream + parse manually) - You are hitting a model that does not support structured outputs (older GPT-3.5 endpoints, local models via Ollama — see how to run Llama 3 locally with Ollama for that setup)
In those cases, layer on top: use structured outputs for the initial extraction, then validate with Pydantic validators or a second LLM call. The two-pass pattern — extract first, verify or enrich second — is the most reliable approach for production pipelines.
If you are interested in how function calling works on the Anthropic side, the Claude tool use practical guide covers the equivalent pattern with Claude’s tool definitions.
Key takeaways:
- Use
client.beta.chat.completions.parse()with a Pydantic v2 model — it handles schema generation and deserialization automatically, and requires GPT-4o (August 2024) or later - Always check
choice.refusalandfinish_reasonbefore accessing.parsed— truncated outputs and refusals are silent failures if you skip this - For constraints that JSON Schema cannot express, structured outputs get you 80% of the way; add a validation layer or retry loop for the rest