LangChain series: Understanding Structured Output

Structured Output

What is Structured Output?

Models can be requested to provide responses in a specific schema. This response, generated in a predefined structure, is called structured output.

LangChain supports multiple schema types and methods for enforcing structured output, making it easier to integrate LLM responses directly into applications.

This is useful because it ensures the output is predictable, machine-readable, and can be directly used in pipelines, APIs, or databases.

Methods to Impose Structured Output:

Pydentic
TypeDict
DataClass

Pydantic

Pydantic models provide the richest feature set, including:

Field validation
Descriptions
Type enforcement
Nested structures

Example:

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="This is the title of the movie")
    director: str = Field(description="This is the director of the movie")
    rating: float = Field(description="This is the rating of the movie")
    year: int = Field(description="This is the year the movie was released")

model_with_structure = model.with_structured_output(Movie)

model_with_structure.invoke("Provide me details of movie Dhurandhar")

Output

Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)

Explanation

A Pydantic model (Movie) is defined with strict types.

Each field includes a description, helping the LLM understand what to generate.

with_structured_output(Movie) tells LangChain to force the response into this schema.

When invoke() is called, the LLM generates output that matches the structure.

The result is returned as a Python object, not just plain text.

This makes it easy to directly use values like:

result.title
result.rating

Message output alongside parsed structure

model_with_structure = model.with_structure_output(Movie, include_raw=True)

Output

{
  "raw": "Dhurandhar is a 2025 film directed by Aditya Dhar with a rating of 9.0.",
  "parsed": Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)
}

Explanation

include_raw=True returns both:
- The raw LLM response (text)
- The parsed structured object

This is useful when:

You want to debug model responses
You need both human-readable and machine-readable formats
You want fallback handling if parsing fails

Nested Structure

Nested structure means using one structured schema inside another schema.

This allows you to represent complex, real-world data such as:

Lists of objects
Hierarchical relationships
Multi-level responses

Example

from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class Movie(BaseModel):
    title: str
    cast: list[Actor]
    genre: list[str]
    budget: float

model_with_structure = model.with_structured_output(Movie)

result = model_with_structure.invoke(
    "Provide movie details for Dhurandhar including cast, genre and budget"
)

Output

Movie(
  title="Dhurandhar",
  cast=[
    Actor(name="Ranveer Singh", role="Lead"),
    Actor(name="Yami Gautam", role="Supporting")
  ],
  genre=["Action", "Drama"],
  budget=150000000.0
)

Explanation

Actor is a separate schema used inside Movie.
cast is a list of Actor objects.
The LLM understands relationships and generates structured nested data.
This is extremely useful for building:
- APIs
- Knowledge graphs
- Structured datasets

TypeDict

TypedDict provides a simpler alternative using Python’s built-in typing.

It is ideal when:

You don’t need runtime validation
You want lightweight structure
Performance is preferred over strict validation

Example

from typing_extensions import TypedDict, Annotated

class Movie(TypedDict):
    title: Annotated[str, "title of the movie"]
    director: Annotated[str, "director of the movie"]
    rating: Annotated[float, "rating of the movie"]

model_with_typed_dict = model.with_structured_output(Movie)

result = model_with_typed_dict.invoke("Provide details of movie Dhurandhar")

Output

{
  "title": "Dhurandhar",
  "director": "Aditya Dhar",
  "rating": 9.0
}

Explanation

TypedDict defines the expected structure using Python typing.
Annotated provides additional context to guide the LLM.
Unlike Pydantic:
- No validation is performed
- Output is a plain dictionary, not an object
This makes it faster but less strict.

Note on `profile` with Structured Output

When using structured output:

model_with_structure.profile

Output

AttributeError: 'RunnableSequence' object has no attribute 'profile'

This happens because:

with_structured_output() wraps the model inside a RunnableSequence
The original model methods (like .profile) are not directly accessible

Without Structured Output

model.profile

Example Output (GPT Model)

{
  "model_name": "gpt-5",
  "max_input_tokens": 128000,
  "max_output_tokens": 4096,
  "supports_streaming": True,
  "supports_function_calling": True,
  "supports_structured_output": True
}

Different models (like GPT, Groq, Gemini) may return slightly different fields, but the concept remains the same.

Dataclasses

A dataclass is a Python class designed to store data. It is simpler than Pydantic and does not include validation.

Example

from dataclasses import dataclass
from langchain.agents import create_agent

@dataclass
class Movie:
    title: str
    year: int
    rating: float

agent = create_agent(
    model="gpt-5",
    response_format=Movie
)

result = agent.invoke("Provide info about Dhurandhar")

Output

`result["structured_output"]`

Movie(title="Dhurandhar", year=2025, rating=9.0)

Full `result`

{
  "output": "Dhurandhar is a 2025 movie with a rating of 9.0.",
  "structured_output": Movie(title="Dhurandhar", year=2025, rating=9.0)
}

Explanation

A dataclass (Movie) defines the structure.
response_format=Movie tells the agent to return structured data.
The agent returns:
- output → raw text
- structured_output → parsed dataclass object

Key Difference

Field	Meaning
`output`	Natural language response
`structured_output`	Clean, typed data

Final Thoughts

Structured output is one of the most powerful features in LangChain.

Use Pydantic → when you need validation and strict structure
Use TypedDict → for lightweight and fast responses
Use Dataclass → for simple structured data with agents

This allows you to move from text-based AI to data-driven AI systems seamlessly.

LangChain series: Understanding Structured Output

Structured Output

What is Structured Output?

Pydantic

Message output alongside parsed structure

Nested Structure

TypeDict

Note on `profile` with Structured Output

Dataclasses

Example

`result["structured_output"]`

Full `result`

Explanation

Key Difference

Final Thoughts

Comments

Getting Started with LangChain

LangChain Basics: LLM's and Agents

More from this blog

LangChain Series: Understanding Messages

Building with LangChain: Models, Streaming, Batch Processing, and Tools

LangChain Basics: LLM's and Agents

Command Palette

Structured Output

What is Structured Output?

Pydantic

Message output alongside parsed structure

Nested Structure

TypeDict

Note on profile with Structured Output

Dataclasses

Example

result["structured_output"]

Full result

Explanation

Key Difference

Final Thoughts

Comments

Getting Started with LangChain

LangChain Basics: LLM's and Agents

More from this blog

Note on `profile` with Structured Output

`result["structured_output"]`

Full `result`