How to Develop Apps Using LangChain and LLMs

LangChain is the glue that connects LLMs to your data. We explain Chains, Prompts, and Agents, and how to build your first app.

Abstract Algorithms

·Mar 9, 2026·14 min read·♥ 3

Share on X / Twitter

Share on LinkedIn

Copy link

TLDR: LangChain is a framework that simplifies building LLM applications. It provides abstractions for Chains (linking steps), Memory (remembering chat history), and Agents (using tools). It turns raw API calls into composable building blocks.

TLDR: LangChain chains, agents, and memory turn raw LLM calls into composable applications — from retrieval-augmented Q&A to multi-step autonomous agents.

📖 Lego Bricks for LLM Apps

Before we explain how LangChain works, here is what it looks like in practice. This five-line chain translates text to French — prompt template, LLM call, and output parsing wired together:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = ChatPromptTemplate.from_template("Translate to French: {text}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()
print(chain.invoke({"text": "Hello, how are you?"}))
# → "Bonjour, comment allez-vous ?"

That | pipe — connecting a prompt template, an LLM, and an output parser — is LangChain's core abstraction. You will understand every part of that line by the end of this guide.

Building with the raw OpenAI API means writing the same boilerplate endlessly: formatting prompts, managing conversation history, parsing outputs, calling tools when needed.

LangChain is the Lego set — pre-assembled pieces (prompt templates, memory stores, output parsers, tool wrappers) that snap together so you can focus on logic rather than plumbing.

Raw API	LangChain
Manual string formatting	`ChatPromptTemplate`
Manual history appending	`ConversationBufferMemory`
Manual tool calling logic	`AgentExecutor`
Manual output parsing	`StrOutputParser`, `JsonOutputParser`

🔍 Core Concepts: What Makes LangChain Different

Raw LLM APIs hand you a hammer and leave you to build the house. Every call is stateless — the model forgets everything the moment you hang up. You must manually format prompt strings, append conversation history to each request, parse the model's text output into structured data, and wire up tool calls yourself. For a one-off script that's fine; for a production chatbot or document Q&A system it becomes hundreds of lines of brittle glue.

LangChain solves this through three architectural layers:

Layer	What it does
Core	Abstract base classes: `Runnable`, `BasePromptTemplate`, `BaseChatMemory`, `BaseTool`
Community	100+ pre-built integrations: OpenAI, Anthropic, Chroma, FAISS, Wikipedia, SQL, and more
LangSmith	Hosted tracing and evaluation — records every prompt, response, tool call, and token cost

The glue holding Core together is LCEL (LangChain Expression Language). The | pipe operator creates a lazy, inspectable pipeline:

chain = prompt | model | parser   # nothing runs yet
chain.invoke({"text": "hello"})   # pipeline executes here

Every component — prompt template, chat model, output parser, retriever — implements the same Runnable protocol: .invoke() for a single call, .stream() for token-by-token output, and .batch() for parallel requests. This uniform interface means you can swap any piece without rewriting the pipeline.

🔢 The Three Core Abstractions

A. Chains — Linking Steps

A Chain connects: User Input → Prompt Template → LLM → Output Parser.

The | operator in LCEL (LangChain Expression Language) pipes the output of one step into the next:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to French: {text}")
chain = prompt | model | StrOutputParser()

result = chain.invoke({"text": "Hello, how are you?"})
# "Bonjour, comment allez-vous ?"

Chains are composable — the output of chain can be piped into another chain.

B. Memory — State Across Turns

LLMs are stateless: each API call starts fresh. LangChain's Memory objects inject conversation history into the next prompt automatically.

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=model, memory=memory)

conversation.predict(input="My name is Alice.")
conversation.predict(input="What is my name?")
# "Your name is Alice."

Memory Type	Keeps	Best For
`ConversationBufferMemory`	Full history	Short sessions
`ConversationSummaryMemory`	LLM-generated summary	Long sessions
`ConversationBufferWindowMemory`	Last N turns	Chatbots with context limit

C. Agents — LLMs That Use Tools

An Agent is an LLM that can decide which tools to call based on the user's question.

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun

tools = [WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

executor.invoke({"input": "What is the boiling point of mercury?"})
# Agent calls Wikipedia → reads result → returns answer

The Agent loop:

flowchart TD
    Q["User Question"] --> LLM["LLM: Choose Action"]
    LLM -->|calls tool| Tool["Tool (Wikipedia, Calculator, DB)"]
    Tool --> Observation["Observation (result)"]
    Observation --> LLM
    LLM -->|has enough info| Answer["Final Answer"]

⚙️ Building a RAG Pipeline with LangChain

Retrieval-Augmented Generation (RAG) is the most common real-world LangChain pattern: load documents → embed them → retrieve relevant chunks → answer with context.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

# 1. Load and split documents
loader = TextLoader("my_docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,    # ~500 tokens: focused enough for precise retrieval, large enough to preserve sentence context
    chunk_overlap=50,  # 10% overlap so sentences split across boundaries appear in both adjacent chunks
)
chunks = splitter.split_documents(docs)

# 2. Embed and store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())

# 3. Build the QA chain
qa = RetrievalQA.from_chain_type(
    llm=model,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4})  # k=4: retrieve 4 chunks — balances context richness vs token budget
)

qa.invoke("What is the refund policy?")

flowchart LR
    Q["User Question"]
    Embed["Embed Question"]
    VDB["Vector Store\n(Chroma/FAISS)"]
    Chunks["Top-K Chunks"]
    LLM["LLM + Context"]
    A["Answer"]

    Q --> Embed --> VDB --> Chunks --> LLM --> A

🧠 Deep Dive: LangSmith Observability for LLM Chains

In production, you need to debug why a chain produced a wrong answer. LangSmith (LangChain's tracing backend) records every step:

Which prompt was sent.
What the LLM returned.
Which tool was called and with what arguments.
Total latency and token cost per step.

Enable tracing:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_key"

All chain invocations are now automatically traced.

⚖️ Trade-offs & Failure Modes: LangChain

Benefit	Risk
Rapid prototyping with composable building blocks	Adds abstraction layers that can obscure errors
Built-in integrations (100+ LLMs, vector stores, tools)	Version churn — API changes frequently
Memory management out of the box	Token cost grows if memory strategy is not tuned
Tracing via LangSmith	Production overhead if not carefully sampled

When to skip LangChain: If your use case is a single LLM call with a fixed prompt, the raw API (OpenAI SDK) is simpler and more debuggable. LangChain pays off when you have multi-step chains, conditional tool use, or complex memory strategies.

📊 Decision Guide: LangChain Application Architecture

A multi-turn agent application wires together every abstraction from the sections above. User input arrives, Memory retrieves prior conversation turns and injects them into the ChatPromptTemplate, the filled prompt is sent to the LLM, and the LLM either calls a Tool or produces a final answer that flows through an Output Parser.

flowchart TD
    Input["User Input"]
    Memory["Memory\n(ConversationBufferMemory)"]
    Template["ChatPromptTemplate\n(system + history + user)"]
    LLM["LLM\n(ChatOpenAI)"]
    Parser["Output Parser\n(StrOutputParser / JsonOutputParser)"]
    Tools["Tools\n(search, calculator, DB)"]
    Agent{"Agent Decision:\nuse tool or answer?"}
    Answer["Final Answer"]

    Input --> Template
    Memory --> Template
    Template --> LLM
    LLM --> Agent
    Agent -->|needs tool| Tools
    Tools --> LLM
    Agent -->|has answer| Parser
    Parser --> Answer

The loop between LLM → Agent → Tools → LLM may iterate several times before the agent decides it has enough information to produce a final answer. AgentExecutor enforces a max_iterations limit to prevent runaway loops, and handle_parsing_errors=True lets the agent recover from malformed tool-call outputs without crashing the entire pipeline.

🌍 Real-World Applications of LangChain

LangChain's composable architecture maps cleanly onto a wide range of production use cases. The table below shows which components carry the load in each scenario and what to watch for in production:

Application Type	LangChain Components Used	Production Consideration
Chat with documents (RAG)	`TextLoader`, `RecursiveCharacterTextSplitter`, `OpenAIEmbeddings`, `Chroma`, `RetrievalQA`	Chunk size and overlap tuning — too large wastes tokens; too small loses context
Customer service bot	`ConversationChain`, `ConversationBufferWindowMemory`, `AgentExecutor`	Memory window size vs. token budget; escalation path when agent confidence is low
Code generation assistant	`ChatPromptTemplate` (system: "You are an expert Python developer"), `StrOutputParser`	Output validation — pipe results through a linter or test runner before showing to user
SQL generator	`SQLDatabaseChain`, `SQLDatabase`, custom prompt with schema	Always run queries in read-only mode; validate SQL before execution
Research assistant agent	`AgentExecutor`, `WikipediaQueryRun`, `ArxivQueryRun`, `ConversationSummaryMemory`	Long sessions accumulate cost — use `ConversationSummaryMemory` to compress history
Content moderation pipeline	Sequential LCEL chain: classifier → reviewer → decision parser	Add confidence threshold check; route low-confidence results to human review queue

When NOT to use LangChain: If your entire application is a single, fixed-prompt LLM call with no memory and no tool use, the raw OpenAI (or Anthropic) SDK is simpler, more transparent, and easier to debug. LangChain's abstractions earn their overhead only when you have multi-step pipelines, state management, or conditional tool use.

🧪 Practical Exercises

Work through these three exercises in order — each one builds on the previous.

Exercise 1 — Build an LCEL Translation Pipeline

Create a chain that translates text to a target language, then scale it with .batch():

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template(
    "Translate the following to {language}: {text}"
)
chain = prompt | model | StrOutputParser()

# Single call
print(chain.invoke({"language": "French", "text": "Good morning!"}))

# Batch — 5 sentences in parallel
sentences = [{"language": "Spanish", "text": s} for s in
             ["Hello", "Thank you", "Goodbye", "How are you?", "See you later"]]
results = chain.batch(sentences)
print(results)

Exercise 2 — Add Memory to a Conversation

Wrap a model in ConversationChain with ConversationBufferMemory and verify it remembers a name across three turns:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conv = ConversationChain(llm=model, memory=memory)

conv.predict(input="My name is Alice.")
conv.predict(input="I work at a robotics startup.")
response = conv.predict(input="What is my name and where do I work?")
print(response)  # Should mention both Alice and the robotics startup

Exercise 3 — Build a Tool-Using Agent

Give an agent a Calculator and a Wikipedia tool, then observe which tool it selects for different question types:

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
from langchain.tools import tool

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

tools = [calculator, WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=5)

executor.invoke({"input": "What is 847 * 293?"})          # uses calculator
executor.invoke({"input": "Who invented the telephone?"}) # uses Wikipedia

📚 Key Lessons from Building with LangChain

LCEL chains are lazy — design for it. The | expression only builds the pipeline; nothing executes until .invoke(), .stream(), or .batch() is called. This enables streaming tokens to a UI and parallel batch processing without any code changes.
Choose memory type by session length. ConversationBufferMemory is simple and accurate for short sessions (< ~10 turns). For long conversations, switch to ConversationSummaryMemory — it compresses history with an LLM call, keeping token usage bounded at the cost of some fidelity.
LangSmith is non-negotiable in production. When a chain produces a wrong answer, you can't debug it from the final output alone. LangSmith records every intermediate prompt, LLM response, and tool call — without it you're flying blind.
Always set AgentExecutor safety limits. Unconstrained agents can loop indefinitely on ambiguous inputs, burning tokens and money. Always set max_iterations (e.g., 10) and handle_parsing_errors=True to recover from malformed tool outputs gracefully.
Prefer LCEL over legacy chain classes for new code. LLMChain and ConversationChain are in maintenance mode. LCEL chains (built with |) are the future-proof API — they support streaming, batching, async, and composition natively, and they integrate directly with LangSmith tracing.

🎯 What to Study Next

📌 TLDR: Summary & Key Takeaways

Memory: Inject conversation history automatically. Choose the right memory type for session length.
Agents: LLMs that call tools in a loop until they have enough information to answer.
RAG: Load → chunk → embed → retrieve → answer. The most common production pattern.
LangSmith: Trace every chain step for debugging and cost analysis.

📝 Practice Quiz

What does the | operator do in LangChain Expression Language (LCEL)?
- A) It is a bitwise OR operation between two values.
- B) It chains the output of one component to the input of the next — building a composable pipeline.
- C) It runs two chains in parallel and returns both results.
- D) It merges two prompt templates into one. Correct Answer: B — The LCEL pipe operator connects Runnables (templates, models, parsers) into a sequential pipeline where each step's output feeds the next.
An LLM chatbot loses context after a few turns. Which LangChain component solves this?
- A) OutputParser.
- B) Memory (e.g., ConversationBufferMemory) — it injects conversation history into each prompt automatically.
- C) AgentExecutor.
- D) RetrievalQA. Correct Answer: B — Memory components store and inject prior turns so the LLM sees the conversation context without the developer manually tracking it.
When should you prefer the raw OpenAI SDK over LangChain?
- A) Always — LangChain is too slow for production.
- B) For simple single-call applications where LangChain's abstractions add complexity without benefit.
- C) Only when deploying to AWS Lambda.
- D) Only when using GPT-4 models. Correct Answer: B — LangChain's value comes from multi-step chains, memory, and tool use. For a single prompt → response call, the raw SDK is simpler and more transparent.

🛠️ LangChain and LangGraph: From LCEL Chains to Stateful Multi-Step Agents

LangChain (the framework introduced throughout this post) provides the LCEL | pipe syntax, ChatPromptTemplate, RunnablePassthrough, and built-in memory/retrieval primitives. LangGraph is LangChain's extension for stateful, cyclical agent graphs — it models agent loops as explicit nodes and edges, replacing the opaque AgentExecutor with a transparent state machine you can inspect and debug.

How they solve the problem in this post: The snippet below shows three patterns: (1) an LCEL chain with RunnablePassthrough passing context alongside transformed values, (2) a legacy LLMChain for comparison, and (3) a minimal LangGraph agent that loops a tool call until the LLM decides it is done.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ─── Pattern 1: LCEL chain with RunnablePassthrough ──────────────────────────
# RunnablePassthrough forwards the original input alongside a transformed field
prompt = ChatPromptTemplate.from_template(
    "Summarise this in one sentence: {text}\nThen answer: {question}"
)

chain = (
    {"text": RunnablePassthrough(), "question": lambda _: "What is the main topic?"}
    | prompt
    | llm
    | StrOutputParser()
)

result = chain.invoke("The Transformer architecture replaced RNNs for NLP tasks in 2017.")
print(result)
# → "The Transformer architecture revolutionised NLP by replacing RNNs. Main topic: Transformers."

# ─── Pattern 2: Legacy LLMChain (still supported, but LCEL preferred) ────────
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

legacy_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template("Translate to Spanish: {text}")
)
print(legacy_chain.run("Hello, world!"))  # → "¡Hola, mundo!"

# ─── Pattern 3: Minimal LangGraph stateful agent with a tool loop ─────────────
# pip install langgraph
from langgraph.graph import StateGraph, END
from langchain_core.tools import tool
from typing import TypedDict, Annotated
import operator

@tool
def word_count(text: str) -> str:
    """Count words in the provided text."""
    return str(len(text.split()))

# Agent state: accumulates messages across turns
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

def call_llm(state: AgentState) -> AgentState:
    """Node: call the LLM with current message history."""
    llm_with_tools = llm.bind_tools([word_count])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState) -> str:
    """Edge: if LLM called a tool, route to tool node; else finish."""
    last = state["messages"][-1]
    return "tools" if last.tool_calls else END

def call_tools(state: AgentState) -> AgentState:
    """Node: execute any tool calls the LLM requested."""
    from langchain_core.messages import ToolMessage
    last = state["messages"][-1]
    results = []
    for call in last.tool_calls:
        output = word_count.invoke(call["args"])
        results.append(ToolMessage(content=output, tool_call_id=call["id"]))
    return {"messages": results}

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("llm",   call_llm)
graph.add_node("tools", call_tools)
graph.set_entry_point("llm")
graph.add_conditional_edges("llm", should_continue)
graph.add_edge("tools", "llm")   # loop back after tool execution
app = graph.compile()

# Run the agent
from langchain_core.messages import HumanMessage
output = app.invoke({"messages": [HumanMessage("How many words in: 'The quick brown fox'?")]})
print(output["messages"][-1].content)
# → "The phrase 'The quick brown fox' contains 4 words."

RunnablePassthrough is the key LCEL primitive for injecting context that bypasses transformation — essential for RAG pipelines where you want both the retrieved context AND the original query flowing forward simultaneously. LangGraph's explicit node/edge model gives you full observability over each loop iteration — something AgentExecutor hid entirely.

For a full deep-dive on LangGraph stateful agents and multi-tool orchestration, a dedicated follow-up post is planned.

Types of LLM Quantization: By Timing, Scope, and Mapping

TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...

Mar 14, 2026•14 min read

Stream Processing Pipeline Pattern: Stateful Real-Time Data Products

TLDR: Stream pipelines succeed when event-time semantics, state management, and replay strategy are designed together — and Kafka Streams lets you build all three directly inside your Spring Boot service. Stripe's real-time fraud detection processes...

Mar 13, 2026•14 min read

Service Mesh Pattern: Control Plane, Data Plane, and Zero-Trust Traffic

TLDR: A service mesh intercepts all service-to-service traffic via injected Envoy sidecar proxies, letting a platform team enforce mTLS, retries, timeouts, and circuit breaking centrally — without changing application code. Reach for it when cross-te...