LangChain series: Understanding Structured Output

Structured Output
What is Structured Output?
Models can be requested to provide responses in a specific schema. This response, generated in a predefined structure, is called structured output.
LangChain supports multiple schema types and methods for enforcing structured output, making it easier to integrate LLM responses directly into applications.
This is useful because it ensures the output is predictable, machine-readable, and can be directly used in pipelines, APIs, or databases.
Methods to Impose Structured Output:
Pydentic
TypeDict
DataClass
Pydantic
Pydantic models provide the richest feature set, including:
Field validation
Descriptions
Type enforcement
Nested structures
Example:
from pydantic import BaseModel, Field
class Movie(BaseModel):
title: str = Field(description="This is the title of the movie")
director: str = Field(description="This is the director of the movie")
rating: float = Field(description="This is the rating of the movie")
year: int = Field(description="This is the year the movie was released")
model_with_structure = model.with_structured_output(Movie)
model_with_structure.invoke("Provide me details of movie Dhurandhar")
Output
Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)
Explanation
A Pydantic model (Movie) is defined with strict types.
Each field includes a description, helping the LLM understand what to generate.
with_structured_output(Movie) tells LangChain to force the response into this schema.
When invoke() is called, the LLM generates output that matches the structure.
The result is returned as a Python object, not just plain text.
This makes it easy to directly use values like:
result.title
result.rating
Message output alongside parsed structure
model_with_structure = model.with_structure_output(Movie, include_raw=True)
Output
{
"raw": "Dhurandhar is a 2025 film directed by Aditya Dhar with a rating of 9.0.",
"parsed": Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)
}
Explanation
include_raw=Truereturns both:The raw LLM response (text)
The parsed structured object
This is useful when:
You want to debug model responses
You need both human-readable and machine-readable formats
You want fallback handling if parsing fails
Nested Structure
Nested structure means using one structured schema inside another schema.
This allows you to represent complex, real-world data such as:
Lists of objects
Hierarchical relationships
Multi-level responses
Example
from pydantic import BaseModel, Field
class Actor(BaseModel):
name: str
role: str
class Movie(BaseModel):
title: str
cast: list[Actor]
genre: list[str]
budget: float
model_with_structure = model.with_structured_output(Movie)
result = model_with_structure.invoke(
"Provide movie details for Dhurandhar including cast, genre and budget"
)
Output
Movie(
title="Dhurandhar",
cast=[
Actor(name="Ranveer Singh", role="Lead"),
Actor(name="Yami Gautam", role="Supporting")
],
genre=["Action", "Drama"],
budget=150000000.0
)
Explanation
Actoris a separate schema used insideMovie.castis a list ofActorobjects.The LLM understands relationships and generates structured nested data.
This is extremely useful for building:
APIs
Knowledge graphs
Structured datasets
TypeDict
TypedDict provides a simpler alternative using Python’s built-in typing.
It is ideal when:
You don’t need runtime validation
You want lightweight structure
Performance is preferred over strict validation
Example
from typing_extensions import TypedDict, Annotated
class Movie(TypedDict):
title: Annotated[str, "title of the movie"]
director: Annotated[str, "director of the movie"]
rating: Annotated[float, "rating of the movie"]
model_with_typed_dict = model.with_structured_output(Movie)
result = model_with_typed_dict.invoke("Provide details of movie Dhurandhar")
Output
{
"title": "Dhurandhar",
"director": "Aditya Dhar",
"rating": 9.0
}
Explanation
TypedDictdefines the expected structure using Python typing.Annotatedprovides additional context to guide the LLM.Unlike Pydantic:
No validation is performed
Output is a plain dictionary, not an object
This makes it faster but less strict.
Note on profile with Structured Output
When using structured output:
model_with_structure.profile
Output
AttributeError: 'RunnableSequence' object has no attribute 'profile'
This happens because:
with_structured_output()wraps the model inside a RunnableSequenceThe original model methods (like
.profile) are not directly accessible
Without Structured Output
model.profile
Example Output (GPT Model)
{
"model_name": "gpt-5",
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"supports_streaming": True,
"supports_function_calling": True,
"supports_structured_output": True
}
Different models (like GPT, Groq, Gemini) may return slightly different fields, but the concept remains the same.
Dataclasses
A dataclass is a Python class designed to store data. It is simpler than Pydantic and does not include validation.
Example
from dataclasses import dataclass
from langchain.agents import create_agent
@dataclass
class Movie:
title: str
year: int
rating: float
agent = create_agent(
model="gpt-5",
response_format=Movie
)
result = agent.invoke("Provide info about Dhurandhar")
Output
result["structured_output"]
Movie(title="Dhurandhar", year=2025, rating=9.0)
Full result
{
"output": "Dhurandhar is a 2025 movie with a rating of 9.0.",
"structured_output": Movie(title="Dhurandhar", year=2025, rating=9.0)
}
Explanation
A dataclass (
Movie) defines the structure.response_format=Movietells the agent to return structured data.The agent returns:
output→ raw textstructured_output→ parsed dataclass object
Key Difference
| Field | Meaning |
|---|---|
output |
Natural language response |
structured_output |
Clean, typed data |
Final Thoughts
Structured output is one of the most powerful features in LangChain.
Use Pydantic → when you need validation and strict structure
Use TypedDict → for lightweight and fast responses
Use Dataclass → for simple structured data with agents
This allows you to move from text-based AI to data-driven AI systems seamlessly.


