Building Multi-Agent Systems with LangGraph

The first time I hit LangGraph's recursion limit in production, it was 2 PM, we had a chargeback dispute that needed to be filed by end of day, and our agent was stuck in a loop trying to fetch evidence that didn't exist.

25 steps. Crash. GraphRecursionError: Recursion limit of 25 reached without hitting a stop condition.

I stared at it for a minute. Then I raised the limit to 50 and redeployed.

(Narrator: This did not fix the problem.)

That was my real introduction to LangGraph in production. We use it at Final Round AI to automate chargeback disputes — an LLM reads transaction data, collects evidence, writes a dispute response, and files it. Multi-step, stateful, branching, sometimes loops back to collect more evidence when the first pass comes up short. Exactly the kind of thing chains can't handle.

Interconnected AI agents forming a directed graph — the LangGraph mental model

Why Chains Aren't Enough

Most AI workflows start with a chain. Prompt A → prompt B → prompt C → done. It works great for transformations — summarize this, classify that, extract the other thing. The moment your workflow needs to make decisions, chains start showing their limits.

No branching. No cycles. No state beyond whatever you're threading through manually. No checkpointing — if step 7 of a 10-step pipeline fails, you restart from step 1.

The chargeback workflow branches based on the dispute category. It loops back if evidence is weak. It routes to a human reviewer for edge cases. It needs to pause mid-execution, wait for a human to approve a draft, then resume and file. That's not a chain. That's a graph — and LangGraph is the library that actually lets you build one without losing your mind.

The Mental Model

LangGraph builds on a genuinely simple idea: your workflow is a directed graph. Nodes are functions. Edges say where to go next.

The state schema is the backbone. You define it once, and every node reads from it and writes back to it:

from langgraph.graph import StateGraph, START, END
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
 
class ChargebackState(TypedDict):
    messages: Annotated[list, add_messages]
    case_id: str
    dispute_category: str
    evidence: list[str]
    draft_response: str
    decision: str
 
graph = StateGraph(ChargebackState)

The Annotated[list, add_messages] on messages is not decoration — it's a reducer. Without it, every node that touches messages would overwrite the entire list. With add_messages, they append. This is how LangGraph lets multiple nodes contribute to a shared conversation history without trampling each other.

Nodes are just Python functions. They receive the full state and return a partial update (only the keys they're changing):

def collect_evidence(state: ChargebackState) -> dict:
    evidence = fetch_transaction_records(state["case_id"])
    dispute_docs = pull_dispute_documents(state["case_id"])
    return {"evidence": [*evidence, *dispute_docs]}

Conditional edges are how you handle branching. A router function inspects state and returns the name of the next node:

def route_after_evidence(state: ChargebackState) -> str:
    if len(state["evidence"]) < 2:
        return "collect_more"   # loop back for another pass
    if state["dispute_category"] == "fraud":
        return "fraud_handler"  # specialized path
    return "write_dispute"      # standard path
 
graph.add_conditional_edges("collect_evidence", route_after_evidence)

That's the whole model. You add nodes, wire edges, compile, invoke. The graph runtime handles scheduling — nodes that can run in parallel run in parallel, nodes that depend on each other run in sequence.

The Supervisor Pattern

For the chargeback agent, we use the supervisor pattern: one orchestrating LLM that decides what to do next, and a set of specialized worker nodes that do the actual work.

from langgraph.graph import MessagesState
from langgraph.types import Command
from langchain_core.messages import SystemMessage
from typing import Literal
from pydantic import BaseModel
 
class Router(BaseModel):
    next: Literal["evidence_collector", "dispute_writer", "human_review", "FINISH"]
 
SUPERVISOR_PROMPT = """You are orchestrating a chargeback dispute workflow.
Workers available: evidence_collector, dispute_writer, human_review.
Based on conversation history, decide who should act next.
Reply FINISH when the dispute is ready to file."""
 
def supervisor(state: MessagesState) -> Command[Literal["evidence_collector", "dispute_writer", "human_review", "__end__"]]:
    response = llm.with_structured_output(Router).invoke(
        [SystemMessage(SUPERVISOR_PROMPT)] + state["messages"]
    )
    if response.next == "FINISH":
        return Command(goto=END)
    return Command(goto=response.next)

The Command(goto=...) API is the modern way to handle dynamic routing. No static add_conditional_edges needed — the supervisor just returns where to go next, and LangGraph handles the routing. Clean.

Worker nodes do their thing and route back to the supervisor:

def evidence_collector(state: MessagesState) -> Command[Literal["supervisor"]]:
    evidence = run_evidence_pipeline(state)
    return Command(
        update={"messages": [AIMessage(content=f"Collected {len(evidence)} evidence items.")]},
        goto="supervisor"
    )
 
def dispute_writer(state: MessagesState) -> Command[Literal["supervisor"]]:
    draft = write_dispute_letter(state)
    return Command(
        update={"messages": [AIMessage(content=f"Draft ready:\n\n{draft}")]},
        goto="supervisor"
    )

The supervisor sees the result, decides what to do next, routes again. This continues until the dispute is written — or until something looks wrong and we punt to human_review.

Multi-agent supervisor architecture: orchestrator routing between specialized worker nodes

Multi-Channel Support: Same Graph, Different Inputs

The other system we run on LangGraph is a multi-channel customer support agent. Messages come in from Email, Instagram, TikTok, and Chatwoot. Each has its own ingestion node. A classifier routes to the right specialist (billing, technical, refunds). Responses go back through the original channel.

The trick: every channel has different metadata, but the processing logic should be channel-agnostic. We normalize at ingestion:

class SupportState(TypedDict):
    messages: Annotated[list, add_messages]
    channel: str        # "email" | "instagram" | "tiktok" | "chatwoot"
    sender_id: str
    thread_id: str
    intent: str
    resolved: bool
 
def ingest_instagram(raw_event: dict) -> SupportState:
    return {
        "channel": "instagram",
        "sender_id": raw_event["from"]["id"],
        "thread_id": raw_event["messaging"][0]["sender"]["id"],
        "messages": [HumanMessage(content=raw_event["messaging"][0]["message"]["text"])],
        "intent": "",
        "resolved": False
    }

After ingestion, every node downstream is channel-blind. The classifier classifies. The specialist responds. A final format_and_send node checks state["channel"] and fires the right API call. LangGraph's shared state makes this kind of multi-input, shared-processing, multi-output flow actually manageable.

The Part That Breaks in Production

Here's what the tutorials skip.

State explosion. Your TypedDict starts with 5 fields. Six months later it has 30. Every node can read every field. Accidental overwrites happen. And the nastiest part: if you rename a key in your schema, existing checkpointed threads won't load. Schema migration is painful. Only make additive changes in production.

The mitigation: scope what each node can see using input/output schemas.

graph = StateGraph(FullState, input=InputSchema, output=OutputSchema)

You have to plan for this from the start. Adding it later is a rewrite.

The recursion limit. Default is 25 supersteps. GraphRecursionError tells you nothing about why you're looping — just that you are. The real fix is ensuring your router has a path to END. The quick fix:

app.invoke(input, config={"recursion_limit": 100})

These are different fixes. Confusing them is how you end up back in that 2 PM production incident.

Unbounded message lists. add_messages appends forever. A 50-turn case will bloat past context limits. Trim periodically:

from langchain_core.messages import RemoveMessage
 
def trim_old_messages(state: ChargebackState) -> dict:
    old = state["messages"][:-10]  # keep last 10
    return {"messages": [RemoveMessage(id=m.id) for m in old]}

Human-in-the-loop. When the agent is about to do something irreversible — file a dispute, issue a refund, send a message — you want a human checkpoint. LangGraph's interrupt() is exactly this:

from langgraph.types import interrupt
 
def human_review_node(state: ChargebackState) -> dict:
    decision = interrupt({
        "question": "Approve this dispute response?",
        "draft": state["draft_response"],
        "case_id": state["case_id"]
    })
    if decision["approved"]:
        return {"decision": "approved"}
    return {"decision": "rejected", "reason": decision["reason"]}

Execution pauses here. You surface the interrupt payload to your UI. A human decides. You resume:

# Resume with the human's decision
app.invoke(
    Command(resume={"approved": True}),
    config={"configurable": {"thread_id": "case-abc-123"}}  # same thread
)

Without a checkpointer compiled into the graph, interrupt() does nothing useful. The checkpointer is what makes pause-and-resume possible. Don't skip it.

from langgraph.checkpoint.postgres import PostgresSaver
 
checkpointer = PostgresSaver.from_conn_string(os.environ["DATABASE_URL"])
app = graph.compile(checkpointer=checkpointer)

TIP

Use MemorySaver in development (it's in-process, no setup). Switch to PostgresSaver in production so threads survive restarts. The API is identical — just swap the checkpointer at compile time.

Debugging When Things Go Wrong

LangGraph's debugging story is... fine. Not great.

The most useful thing: stream state updates instead of invoking blind.

for chunk in app.stream(inputs, config=config, stream_mode="updates"):
    print(chunk)  # shows which node ran and what it returned

This gives you a step-by-step trace without LangSmith. When something's looping, you'll see the same node appearing over and over and can figure out why your conditional edge isn't routing to END.

For production, LangSmith traces are worth the setup. You get a visual view of every step, input/output at each node, token usage, latency. Worth it when you're staring at a failed chargeback run trying to figure out what the evidence collector returned.

NOTE

The default recursion limit of 25 was chosen to catch infinite loops, not because 25 is a reasonable limit for complex workflows. Anything with more than a handful of nodes should probably set it to 50 or 100 explicitly.

Is LangGraph Worth It?

For simple workflows: probably not. A chain is fine. langchain expression language will take you far.

For anything with real branching logic: yes. The graph model forces you to make control flow explicit. That's uncomfortable at first and correct in the long run.

For anything that needs pause-and-resume: definitely yes. Building this yourself is a project.

For anything that needs multiple specialized agents coordinating around shared state: LangGraph is genuinely the right tool. The alternatives are either much simpler (no state management) or much heavier (full orchestration platforms).

The thing I didn't expect when we first adopted it: when something breaks in production — and it will — you can trace exactly which node failed, what state it received, and what it returned. Compare that to debugging a prompt chain where everything is one giant string. The explicitness pays dividends.

Now if you'll excuse me, I need to go check why our evidence collector is trying to fetch records for case IDs that don't exist anymore. The recursion limit is set to 100. I'm sure it'll be fine.

(Narrator: It was not fine.)