How to Develop Apps Using LangChain and LLMs
LangChain is the glue that connects LLMs to your data. We explain Chains, Prompts, and Agents, and how to build your first app.
Abstract AlgorithmsTLDR: LangChain is a framework that simplifies building LLM applications. It provides abstractions for Chains (linking steps), Memory (remembering chat history), and Agents (using tools). It turns raw API calls into composable building blocks.
TLDR: LangChain chains, agents, and memory turn raw LLM calls into composable applications — from retrieval-augmented Q&A to multi-step autonomous agents.
📖 Lego Bricks for LLM Apps
Before we explain how LangChain works, here is what it looks like in practice. This five-line chain translates text to French — prompt template, LLM call, and output parsing wired together:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = ChatPromptTemplate.from_template("Translate to French: {text}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()
print(chain.invoke({"text": "Hello, how are you?"}))
# → "Bonjour, comment allez-vous ?"
That | pipe — connecting a prompt template, an LLM, and an output parser — is LangChain's core abstraction. You will understand every part of that line by the end of this guide.
Building with the raw OpenAI API means writing the same boilerplate endlessly: formatting prompts, managing conversation history, parsing outputs, calling tools when needed.
LangChain is the Lego set — pre-assembled pieces (prompt templates, memory stores, output parsers, tool wrappers) that snap together so you can focus on logic rather than plumbing.
| Raw API | LangChain |
| Manual string formatting | ChatPromptTemplate |
| Manual history appending | ConversationBufferMemory |
| Manual tool calling logic | AgentExecutor |
| Manual output parsing | StrOutputParser, JsonOutputParser |
🔍 Core Concepts: What Makes LangChain Different
Raw LLM APIs hand you a hammer and leave you to build the house. Every call is stateless — the model forgets everything the moment you hang up. You must manually format prompt strings, append conversation history to each request, parse the model's text output into structured data, and wire up tool calls yourself. For a one-off script that's fine; for a production chatbot or document Q&A system it becomes hundreds of lines of brittle glue.
LangChain solves this through three architectural layers:
| Layer | What it does |
| Core | Abstract base classes: Runnable, BasePromptTemplate, BaseChatMemory, BaseTool |
| Community | 100+ pre-built integrations: OpenAI, Anthropic, Chroma, FAISS, Wikipedia, SQL, and more |
| LangSmith | Hosted tracing and evaluation — records every prompt, response, tool call, and token cost |
The glue holding Core together is LCEL (LangChain Expression Language). The | pipe operator creates a lazy, inspectable pipeline:
chain = prompt | model | parser # nothing runs yet
chain.invoke({"text": "hello"}) # pipeline executes here
Every component — prompt template, chat model, output parser, retriever — implements the same Runnable protocol: .invoke() for a single call, .stream() for token-by-token output, and .batch() for parallel requests. This uniform interface means you can swap any piece without rewriting the pipeline.
🔢 The Three Core Abstractions
A. Chains — Linking Steps
A Chain connects: User Input → Prompt Template → LLM → Output Parser.
The | operator in LCEL (LangChain Expression Language) pipes the output of one step into the next:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to French: {text}")
chain = prompt | model | StrOutputParser()
result = chain.invoke({"text": "Hello, how are you?"})
# "Bonjour, comment allez-vous ?"
Chains are composable — the output of chain can be piped into another chain.
B. Memory — State Across Turns
LLMs are stateless: each API call starts fresh. LangChain's Memory objects inject conversation history into the next prompt automatically.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=model, memory=memory)
conversation.predict(input="My name is Alice.")
conversation.predict(input="What is my name?")
# "Your name is Alice."
| Memory Type | Keeps | Best For |
ConversationBufferMemory | Full history | Short sessions |
ConversationSummaryMemory | LLM-generated summary | Long sessions |
ConversationBufferWindowMemory | Last N turns | Chatbots with context limit |
C. Agents — LLMs That Use Tools
An Agent is an LLM that can decide which tools to call based on the user's question.
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
tools = [WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"input": "What is the boiling point of mercury?"})
# Agent calls Wikipedia → reads result → returns answer
The Agent loop:
flowchart TD
Q["User Question"] --> LLM["LLM: Choose Action"]
LLM -->|calls tool| Tool["Tool (Wikipedia, Calculator, DB)"]
Tool --> Observation["Observation (result)"]
Observation --> LLM
LLM -->|has enough info| Answer["Final Answer"]
⚙️ Building a RAG Pipeline with LangChain
Retrieval-Augmented Generation (RAG) is the most common real-world LangChain pattern: load documents → embed them → retrieve relevant chunks → answer with context.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Load and split documents
loader = TextLoader("my_docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # ~500 tokens: focused enough for precise retrieval, large enough to preserve sentence context
chunk_overlap=50, # 10% overlap so sentences split across boundaries appear in both adjacent chunks
)
chunks = splitter.split_documents(docs)
# 2. Embed and store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
# 3. Build the QA chain
qa = RetrievalQA.from_chain_type(
llm=model,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}) # k=4: retrieve 4 chunks — balances context richness vs token budget
)
qa.invoke("What is the refund policy?")
flowchart LR
Q["User Question"]
Embed["Embed Question"]
VDB["Vector Store\n(Chroma/FAISS)"]
Chunks["Top-K Chunks"]
LLM["LLM + Context"]
A["Answer"]
Q --> Embed --> VDB --> Chunks --> LLM --> A
🧠 Deep Dive: LangSmith Observability for LLM Chains
In production, you need to debug why a chain produced a wrong answer. LangSmith (LangChain's tracing backend) records every step:
- Which prompt was sent.
- What the LLM returned.
- Which tool was called and with what arguments.
- Total latency and token cost per step.
Enable tracing:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_key"
All chain invocations are now automatically traced.
⚖️ Trade-offs & Failure Modes: LangChain
| Benefit | Risk |
| Rapid prototyping with composable building blocks | Adds abstraction layers that can obscure errors |
| Built-in integrations (100+ LLMs, vector stores, tools) | Version churn — API changes frequently |
| Memory management out of the box | Token cost grows if memory strategy is not tuned |
| Tracing via LangSmith | Production overhead if not carefully sampled |
When to skip LangChain: If your use case is a single LLM call with a fixed prompt, the raw API (OpenAI SDK) is simpler and more debuggable. LangChain pays off when you have multi-step chains, conditional tool use, or complex memory strategies.
📊 Decision Guide: LangChain Application Architecture
A multi-turn agent application wires together every abstraction from the sections above. User input arrives, Memory retrieves prior conversation turns and injects them into the ChatPromptTemplate, the filled prompt is sent to the LLM, and the LLM either calls a Tool or produces a final answer that flows through an Output Parser.
flowchart TD
Input["User Input"]
Memory["Memory\n(ConversationBufferMemory)"]
Template["ChatPromptTemplate\n(system + history + user)"]
LLM["LLM\n(ChatOpenAI)"]
Parser["Output Parser\n(StrOutputParser / JsonOutputParser)"]
Tools["Tools\n(search, calculator, DB)"]
Agent{"Agent Decision:\nuse tool or answer?"}
Answer["Final Answer"]
Input --> Template
Memory --> Template
Template --> LLM
LLM --> Agent
Agent -->|needs tool| Tools
Tools --> LLM
Agent -->|has answer| Parser
Parser --> Answer
The loop between LLM → Agent → Tools → LLM may iterate several times before the agent decides it has enough information to produce a final answer. AgentExecutor enforces a max_iterations limit to prevent runaway loops, and handle_parsing_errors=True lets the agent recover from malformed tool-call outputs without crashing the entire pipeline.
🌍 Real-World Applications of LangChain
LangChain's composable architecture maps cleanly onto a wide range of production use cases. The table below shows which components carry the load in each scenario and what to watch for in production:
| Application Type | LangChain Components Used | Production Consideration |
| Chat with documents (RAG) | TextLoader, RecursiveCharacterTextSplitter, OpenAIEmbeddings, Chroma, RetrievalQA | Chunk size and overlap tuning — too large wastes tokens; too small loses context |
| Customer service bot | ConversationChain, ConversationBufferWindowMemory, AgentExecutor | Memory window size vs. token budget; escalation path when agent confidence is low |
| Code generation assistant | ChatPromptTemplate (system: "You are an expert Python developer"), StrOutputParser | Output validation — pipe results through a linter or test runner before showing to user |
| SQL generator | SQLDatabaseChain, SQLDatabase, custom prompt with schema | Always run queries in read-only mode; validate SQL before execution |
| Research assistant agent | AgentExecutor, WikipediaQueryRun, ArxivQueryRun, ConversationSummaryMemory | Long sessions accumulate cost — use ConversationSummaryMemory to compress history |
| Content moderation pipeline | Sequential LCEL chain: classifier → reviewer → decision parser | Add confidence threshold check; route low-confidence results to human review queue |
When NOT to use LangChain: If your entire application is a single, fixed-prompt LLM call with no memory and no tool use, the raw OpenAI (or Anthropic) SDK is simpler, more transparent, and easier to debug. LangChain's abstractions earn their overhead only when you have multi-step pipelines, state management, or conditional tool use.
🧪 Practical Exercises
Work through these three exercises in order — each one builds on the previous.
Exercise 1 — Build an LCEL Translation Pipeline
Create a chain that translates text to a target language, then scale it with .batch():
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template(
"Translate the following to {language}: {text}"
)
chain = prompt | model | StrOutputParser()
# Single call
print(chain.invoke({"language": "French", "text": "Good morning!"}))
# Batch — 5 sentences in parallel
sentences = [{"language": "Spanish", "text": s} for s in
["Hello", "Thank you", "Goodbye", "How are you?", "See you later"]]
results = chain.batch(sentences)
print(results)
Exercise 2 — Add Memory to a Conversation
Wrap a model in ConversationChain with ConversationBufferMemory and verify it remembers a name across three turns:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
conv = ConversationChain(llm=model, memory=memory)
conv.predict(input="My name is Alice.")
conv.predict(input="I work at a robotics startup.")
response = conv.predict(input="What is my name and where do I work?")
print(response) # Should mention both Alice and the robotics startup
Exercise 3 — Build a Tool-Using Agent
Give an agent a Calculator and a Wikipedia tool, then observe which tool it selects for different question types:
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
from langchain.tools import tool
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
tools = [calculator, WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=5)
executor.invoke({"input": "What is 847 * 293?"}) # uses calculator
executor.invoke({"input": "Who invented the telephone?"}) # uses Wikipedia
📚 Key Lessons from Building with LangChain
LCEL chains are lazy — design for it. The
|expression only builds the pipeline; nothing executes until.invoke(),.stream(), or.batch()is called. This enables streaming tokens to a UI and parallel batch processing without any code changes.Choose memory type by session length.
ConversationBufferMemoryis simple and accurate for short sessions (< ~10 turns). For long conversations, switch toConversationSummaryMemory— it compresses history with an LLM call, keeping token usage bounded at the cost of some fidelity.LangSmith is non-negotiable in production. When a chain produces a wrong answer, you can't debug it from the final output alone. LangSmith records every intermediate prompt, LLM response, and tool call — without it you're flying blind.
Always set
AgentExecutorsafety limits. Unconstrained agents can loop indefinitely on ambiguous inputs, burning tokens and money. Always setmax_iterations(e.g.,10) andhandle_parsing_errors=Trueto recover from malformed tool outputs gracefully.Prefer LCEL over legacy chain classes for new code.
LLMChainandConversationChainare in maintenance mode. LCEL chains (built with|) are the future-proof API — they support streaming, batching, async, and composition natively, and they integrate directly with LangSmith tracing.
🎯 What to Study Next
- RAG with LangChain and ChromaDB
- AI Agents Explained: When LLMs Start Using Tools
- Prompt Engineering Guide: Zero-Shot to Chain-of-Thought
📌 TLDR: Summary & Key Takeaways
- Memory: Inject conversation history automatically. Choose the right memory type for session length.
- Agents: LLMs that call tools in a loop until they have enough information to answer.
- RAG: Load → chunk → embed → retrieve → answer. The most common production pattern.
- LangSmith: Trace every chain step for debugging and cost analysis.
📝 Practice Quiz
What does the
|operator do in LangChain Expression Language (LCEL)?- A) It is a bitwise OR operation between two values.
- B) It chains the output of one component to the input of the next — building a composable pipeline.
- C) It runs two chains in parallel and returns both results.
- D) It merges two prompt templates into one. Correct Answer: B — The LCEL pipe operator connects Runnables (templates, models, parsers) into a sequential pipeline where each step's output feeds the next.
An LLM chatbot loses context after a few turns. Which LangChain component solves this?
- A) OutputParser.
- B) Memory (e.g., ConversationBufferMemory) — it injects conversation history into each prompt automatically.
- C) AgentExecutor.
- D) RetrievalQA. Correct Answer: B — Memory components store and inject prior turns so the LLM sees the conversation context without the developer manually tracking it.
When should you prefer the raw OpenAI SDK over LangChain?
- A) Always — LangChain is too slow for production.
- B) For simple single-call applications where LangChain's abstractions add complexity without benefit.
- C) Only when deploying to AWS Lambda.
- D) Only when using GPT-4 models. Correct Answer: B — LangChain's value comes from multi-step chains, memory, and tool use. For a single prompt → response call, the raw SDK is simpler and more transparent.
🛠️ LangChain and LangGraph: From LCEL Chains to Stateful Multi-Step Agents
LangChain (the framework introduced throughout this post) provides the LCEL | pipe syntax, ChatPromptTemplate, RunnablePassthrough, and built-in memory/retrieval primitives. LangGraph is LangChain's extension for stateful, cyclical agent graphs — it models agent loops as explicit nodes and edges, replacing the opaque AgentExecutor with a transparent state machine you can inspect and debug.
How they solve the problem in this post: The snippet below shows three patterns: (1) an LCEL chain with RunnablePassthrough passing context alongside transformed values, (2) a legacy LLMChain for comparison, and (3) a minimal LangGraph agent that loops a tool call until the LLM decides it is done.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# ─── Pattern 1: LCEL chain with RunnablePassthrough ──────────────────────────
# RunnablePassthrough forwards the original input alongside a transformed field
prompt = ChatPromptTemplate.from_template(
"Summarise this in one sentence: {text}\nThen answer: {question}"
)
chain = (
{"text": RunnablePassthrough(), "question": lambda _: "What is the main topic?"}
| prompt
| llm
| StrOutputParser()
)
result = chain.invoke("The Transformer architecture replaced RNNs for NLP tasks in 2017.")
print(result)
# → "The Transformer architecture revolutionised NLP by replacing RNNs. Main topic: Transformers."
# ─── Pattern 2: Legacy LLMChain (still supported, but LCEL preferred) ────────
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
legacy_chain = LLMChain(
llm=llm,
prompt=PromptTemplate.from_template("Translate to Spanish: {text}")
)
print(legacy_chain.run("Hello, world!")) # → "¡Hola, mundo!"
# ─── Pattern 3: Minimal LangGraph stateful agent with a tool loop ─────────────
# pip install langgraph
from langgraph.graph import StateGraph, END
from langchain_core.tools import tool
from typing import TypedDict, Annotated
import operator
@tool
def word_count(text: str) -> str:
"""Count words in the provided text."""
return str(len(text.split()))
# Agent state: accumulates messages across turns
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
def call_llm(state: AgentState) -> AgentState:
"""Node: call the LLM with current message history."""
llm_with_tools = llm.bind_tools([word_count])
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
"""Edge: if LLM called a tool, route to tool node; else finish."""
last = state["messages"][-1]
return "tools" if last.tool_calls else END
def call_tools(state: AgentState) -> AgentState:
"""Node: execute any tool calls the LLM requested."""
from langchain_core.messages import ToolMessage
last = state["messages"][-1]
results = []
for call in last.tool_calls:
output = word_count.invoke(call["args"])
results.append(ToolMessage(content=output, tool_call_id=call["id"]))
return {"messages": results}
# Build the graph
graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tools", call_tools)
graph.set_entry_point("llm")
graph.add_conditional_edges("llm", should_continue)
graph.add_edge("tools", "llm") # loop back after tool execution
app = graph.compile()
# Run the agent
from langchain_core.messages import HumanMessage
output = app.invoke({"messages": [HumanMessage("How many words in: 'The quick brown fox'?")]})
print(output["messages"][-1].content)
# → "The phrase 'The quick brown fox' contains 4 words."
RunnablePassthrough is the key LCEL primitive for injecting context that bypasses transformation — essential for RAG pipelines where you want both the retrieved context AND the original query flowing forward simultaneously. LangGraph's explicit node/edge model gives you full observability over each loop iteration — something AgentExecutor hid entirely.
For a full deep-dive on LangGraph stateful agents and multi-tool orchestration, a dedicated follow-up post is planned.
🔗 Related Posts
- RAG with LangChain and ChromaDB
- Mastering Prompt Templates with LangChain
- AI Agents Explained: When LLMs Start Using Tools
- Prompt Engineering Guide: Zero-Shot to Chain-of-Thought

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Types of LLM Quantization: By Timing, Scope, and Mapping
TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...
Stream Processing Pipeline Pattern: Stateful Real-Time Data Products
TLDR: Stream pipelines succeed when event-time semantics, state management, and replay strategy are designed together — and Kafka Streams lets you build all three directly inside your Spring Boot service. Stripe's real-time fraud detection processes...
Service Mesh Pattern: Control Plane, Data Plane, and Zero-Trust Traffic
TLDR: A service mesh intercepts all service-to-service traffic via injected Envoy sidecar proxies, letting a platform team enforce mTLS, retries, timeouts, and circuit breaking centrally — without changing application code. Reach for it when cross-te...
Serverless Architecture Pattern: Event-Driven Scale with Operational Guardrails
TLDR: Serverless is strongest for spiky asynchronous workloads when cold-start, observability, and state boundaries are intentionally designed. TLDR: Serverless works best for spiky, event-driven workloads when you design for idempotency, observabili...
