Skip to main content

Command Palette

Search for a command to run...

LangChain series: Understanding Structured Output

Updated
5 min read
LangChain series: Understanding Structured Output

Structured Output

What is Structured Output?

Models can be requested to provide responses in a specific schema. This response, generated in a predefined structure, is called structured output.

LangChain supports multiple schema types and methods for enforcing structured output, making it easier to integrate LLM responses directly into applications.

This is useful because it ensures the output is predictable, machine-readable, and can be directly used in pipelines, APIs, or databases.

Methods to Impose Structured Output:

  1. Pydentic

  2. TypeDict

  3. DataClass

Pydantic

Pydantic models provide the richest feature set, including:

  • Field validation

  • Descriptions

  • Type enforcement

  • Nested structures

Example:

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="This is the title of the movie")
    director: str = Field(description="This is the director of the movie")
    rating: float = Field(description="This is the rating of the movie")
    year: int = Field(description="This is the year the movie was released")

model_with_structure = model.with_structured_output(Movie)

model_with_structure.invoke("Provide me details of movie Dhurandhar")

Output

Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)    

Explanation

A Pydantic model (Movie) is defined with strict types.

Each field includes a description, helping the LLM understand what to generate.

with_structured_output(Movie) tells LangChain to force the response into this schema.

When invoke() is called, the LLM generates output that matches the structure.

The result is returned as a Python object, not just plain text.

This makes it easy to directly use values like:

result.title
result.rating

Message output alongside parsed structure

model_with_structure = model.with_structure_output(Movie, include_raw=True)

Output

{
  "raw": "Dhurandhar is a 2025 film directed by Aditya Dhar with a rating of 9.0.",
  "parsed": Movie(title="Dhurandhar", director="Aditya Dhar", rating=9.0, year=2025)
}

Explanation

  • include_raw=True returns both:

    • The raw LLM response (text)

    • The parsed structured object

This is useful when:

  • You want to debug model responses

  • You need both human-readable and machine-readable formats

  • You want fallback handling if parsing fails

Nested Structure

Nested structure means using one structured schema inside another schema.

This allows you to represent complex, real-world data such as:

  • Lists of objects

  • Hierarchical relationships

  • Multi-level responses

Example

from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class Movie(BaseModel):
    title: str
    cast: list[Actor]
    genre: list[str]
    budget: float

model_with_structure = model.with_structured_output(Movie)

result = model_with_structure.invoke(
    "Provide movie details for Dhurandhar including cast, genre and budget"
)  

Output

Movie(
  title="Dhurandhar",
  cast=[
    Actor(name="Ranveer Singh", role="Lead"),
    Actor(name="Yami Gautam", role="Supporting")
  ],
  genre=["Action", "Drama"],
  budget=150000000.0
)

Explanation

  • Actor is a separate schema used inside Movie.

  • cast is a list of Actor objects.

  • The LLM understands relationships and generates structured nested data.

  • This is extremely useful for building:

    • APIs

    • Knowledge graphs

    • Structured datasets

TypeDict

TypedDict provides a simpler alternative using Python’s built-in typing.

It is ideal when:

  • You don’t need runtime validation

  • You want lightweight structure

  • Performance is preferred over strict validation

Example

from typing_extensions import TypedDict, Annotated

class Movie(TypedDict):
    title: Annotated[str, "title of the movie"]
    director: Annotated[str, "director of the movie"]
    rating: Annotated[float, "rating of the movie"]

model_with_typed_dict = model.with_structured_output(Movie)

result = model_with_typed_dict.invoke("Provide details of movie Dhurandhar")

Output

{
  "title": "Dhurandhar",
  "director": "Aditya Dhar",
  "rating": 9.0
}

Explanation

  • TypedDict defines the expected structure using Python typing.

  • Annotated provides additional context to guide the LLM.

  • Unlike Pydantic:

    • No validation is performed

    • Output is a plain dictionary, not an object

  • This makes it faster but less strict.

Note on profile with Structured Output

When using structured output:

model_with_structure.profile

Output

AttributeError: 'RunnableSequence' object has no attribute 'profile'

This happens because:

  • with_structured_output() wraps the model inside a RunnableSequence

  • The original model methods (like .profile) are not directly accessible

Without Structured Output

model.profile

Example Output (GPT Model)

{
  "model_name": "gpt-5",
  "max_input_tokens": 128000,
  "max_output_tokens": 4096,
  "supports_streaming": True,
  "supports_function_calling": True,
  "supports_structured_output": True
}

Different models (like GPT, Groq, Gemini) may return slightly different fields, but the concept remains the same.

Dataclasses

A dataclass is a Python class designed to store data. It is simpler than Pydantic and does not include validation.

Example

from dataclasses import dataclass
from langchain.agents import create_agent

@dataclass
class Movie:
    title: str
    year: int
    rating: float

agent = create_agent(
    model="gpt-5",
    response_format=Movie
)

result = agent.invoke("Provide info about Dhurandhar")

Output

result["structured_output"]

Movie(title="Dhurandhar", year=2025, rating=9.0)

Full result

{
  "output": "Dhurandhar is a 2025 movie with a rating of 9.0.",
  "structured_output": Movie(title="Dhurandhar", year=2025, rating=9.0)
}

Explanation

  • A dataclass (Movie) defines the structure.

  • response_format=Movie tells the agent to return structured data.

  • The agent returns:

    • output → raw text

    • structured_output → parsed dataclass object

Key Difference

Field Meaning
output Natural language response
structured_output Clean, typed data

Final Thoughts

Structured output is one of the most powerful features in LangChain.

  • Use Pydantic → when you need validation and strict structure

  • Use TypedDict → for lightweight and fast responses

  • Use Dataclass → for simple structured data with agents

This allows you to move from text-based AI to data-driven AI systems seamlessly.