Building with LangChain: Models, Streaming, Batch Processing, and Tools

Model Integration in LangChain
There are several ways to integrate models into a system using LangChain. The method you choose depends on flexibility, provider requirements, and project complexity.
1. Creating an Agent with a Specific Model
You can define which model to use while creating an agent.
agent = create_agent(model="model_name")
Here, the agent is explicitly configured to use the specified model.
2. Using the Chat Model Initialization Library
from langchain.chat_models import init_chat_model
model = init_chat_model("gpt-5")
init_chat_model() is a LangChain utility function used to initialize chat models easily.
It provides a simple, unified interface.
It works across different model providers.
It is suitable when you want flexibility without provider-specific configuration.
3. Using a Specific Provider Model
If you want to use a specific provider (for example, OpenAI), you can directly import its integration package:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-5")
This method is used when:
You are working exclusively with one provider.
You need provider-specific configurations.
Similarly, other providers such as Gemini or Groq have their own LangChain integration libraries.
Streaming
What is Streaming?
Streaming is the process of displaying the model's output while it is still being generated.
Most modern models support streaming. It improves user experience because:
The user does not need to wait for the full response.
Output appears progressively.
It is especially useful for long responses.
Basic Implementation
model.stream("message")
This returns a generator object. It does not automatically print the output.
To properly stream the response:
for chunk in model.stream("message"):
print(chunk.text, end="", flush=True)
Now the output appears token by token (or chunk by chunk) while it is being generated.
Batch Processing
What is Batch?
Batching is the process of sending multiple independent requests to a model at the same time.
Instead of processing them sequentially, the system processes them in parallel. This:
Improves performance
Reduces latency
Can reduce cost depending on provider
Basic Example
responses = model.batch([
"This is batch 1",
"This is batch 2",
"This is batch 3"
])
for response in responses:
print(response)
This sends three independent prompts to the model simultaneously and prints their outputs one by one.
Controlling Concurrency
You can limit the number of parallel requests using configuration:
responses = model.batch(
["Batch 1", "Batch 2"],
config={"max_concurrency": 5}
)
The max_concurrency parameter defines the maximum number of parallel calls allowed at a time.
Real-World Example
ChatGPT uses streaming during normal conversations. When you type a message, the response appears gradually — this is streaming.
As conversations become longer and the context size increases, managing responses becomes more complex. Sometimes you may observe that part of a response appears, pauses, and then continues. This behavior can be related to internal request handling and optimization strategies such as batching and parallel processing.
As context grows, computation becomes heavier. Systems may internally optimize how responses are generated and delivered to maintain performance.
Tools in LangChain
Tools allow a model to call external functions.
Step 1: Import the Tool Decorator
from langchain.tools import tool
Step 2: Define a Tool
@tool
def get_weather(location: str) -> str:
"""Get weather information for a location."""
return f"It's rainy in {location}."
The @tool decorator converts the function into a callable tool that the model can use.
Step 3: Bind the Tool to the Model
model_with_tools = model.bind_tools([get_weather])
Now the model is capable of calling the tool when necessary.
Step 4: Invoke the Model
response = model_with_tools.invoke("What is the weather in Maharashtra?")
print(response)
The model may decide to call the tool if it detects that the query requires it.
Example with a Proper Functioning Tool (Real Weather API)
Below is an example of integrating a real weather API using the requests library.
import requests
from langchain.tools import tool
from langchain_openai import ChatOpenAI
# Step 1: Define Tool
@tool
def get_weather(location: str) -> str:
"""Fetch real weather data for a location."""
api_key = "YOUR_API_KEY"
url = f"http://api.openweathermap.org/data/2.5/weather?q={location}&appid={api_key}&units=metric"
response = requests.get(url)
data = response.json()
if response.status_code != 200:
return "Unable to fetch weather data."
temp = data["main"]["temp"]
description = data["weather"][0]["description"]
return f"The temperature in {location} is {temp}°C with {description}."
# Step 2: Initialize Model
model = ChatOpenAI(model="gpt-5")
# Step 3: Bind Tool
model_with_tools = model.bind_tools([get_weather])
# Step 4: Invoke
response = model_with_tools.invoke("What is the weather in Mumbai?")
print(response)
How This Works
The model analyzes the user query.
It detects that weather data is required.
It calls the
get_weathertool.The tool fetches real-time data from the API.
The model formats and returns the final response.
Summary
Models can be integrated using agents, generic initialization, or provider-specific libraries.
Streaming improves user experience by displaying output in real time.
Batching processes multiple independent requests in parallel.
Tools enable models to interact with external systems.
Real-world applications combine these techniques for scalable and efficient systems.


