Skip to content
Posts/every agent framework, ranked

every agent framework, ranked

Every agent framework is a disagreement about how much to trust the model.

LangChain, LangGraph, n8n, Agno, and the broader agent landscape in 2026.

table of contents

  1. what "agent" actually means
  2. how we got here
  3. the core mental models
  4. the frameworks, one by one
  5. the protocol layer
  6. where things are right now
  7. how to actually choose

what "agent" actually means in this conversation

An agent is an LLM in a loop with tools, picking what to do next. The first version was ReAct (2022): think, pick a tool, see the result, repeat until done. Every framework since is a different answer to the same question: how much of that loop do you hard-code, how much do you let the model decide.

One axis explains the whole space: how much control you hand to the model. Rigid pipeline on one end (LLM is a string transformer at each node), one big "do the thing" call on the other (model plans, retries, finishes). Every framework sits somewhere on that line. As models get smarter, the line moves and the frameworks have to keep justifying themselves.

Control spectrum of agent architectures Five architectural categories arranged from developer-controlled flow on the left to model-controlled flow on the right. Developer drives flow Model drives flow Chains Sequential Graphs With cycles Teams Role-based Dialogue Multi-agent Harness LLM-driven

how we got here

Before late 2022, "agent" meant RL agents in research labs. The current usage starts with ReAct (Yao et al., late 2022). AutoGPT and BabyAGI showed up in early 2023, neither great, both planting the meme that you could let GPT-4 loop on itself with tools and have it do real things. Early systems failed the same way: loops, lost plots, tokens burned chasing hallucinated subgoals.

LangChain showed up as the first widely-used Python library for chaining LLM calls, plugging in tools, doing retrieval. It became dominant because it was the shortest path from "I have an API key" to "I have a working RAG system." Senior engineers complained it hid prompts behind layers. By mid-2023 many concluded calling the API directly was simpler. The team rewrote LangChain in 2025 to be leaner.

Wave two (late 2023 - 2024): agents need state, not just chains. A chain is acyclic - no loops, no memory across runs. Real agents revisit steps, retry, wait for humans, persist history, hold typed state across many tool calls. LangGraph was the LangChain team's answer. CrewAI bet on "team of specialists" instead. AutoGen (Microsoft) bet on multi-agent conversation.

Wave three (2025 - 2026): Claude 3.5/4 and GPT-4o/5 got reliable enough at tool use that you could just hand them tools and let them loop. Anthropic, OpenAI, AWS, and LangChain each shipped a harness: Claude Agent SDK, Agents SDK (replacing Swarm), Strands Agents, deepagents. Meanwhile MCP (Anthropic, Nov 2024) became the closest thing the field has to a universal protocol.

Three generations of agent frameworks Evolution from chain-based agents in 2022-23 through stateful agents in 2024 to model-driven harnesses in 2025-26. Origins 2022 - 2023 ReAct paper AutoGPT, BabyAGI LangChain v0 Stateful agents 2024 LangGraph CrewAI, AutoGen n8n adds AI MCP launched Agent harnesses 2025 - 2026 OpenAI Agents SDK Claude Agent SDK Strands, Google ADK Agno, deepagents

the core mental models

Six metaphors cover the space.

Chain / pipeline. A DAG of LLM calls and ops. Data flows through, you specify the structure, the LLM is the smart bit at each step. Early LangChain. Fine for fixed pipelines, not really agentic.

Stateful graph / state machine. Chain with cycles, conditional edges, and a shared state object. The LLM decides which edge to take at certain nodes; you still define the topology. LangGraph. Dominant for production where you need to know what the agent did and why.

Role-based team. Agents are specialists with roles, goals, and tools. You give them tasks; the framework handles delegation, communication, aggregation. CrewAI. Fast for problems that fit the metaphor, painful when they don't.

Multi-agent conversation. Multiple agents share a dialogue, an orchestrator picks who speaks next. AutoGen / AG2. Good for emergent behavior from debate, refinement, specialization. Most research happens here.

Agent harness / tool-calling loop. Thin wrapper around the model: handles streaming, retries, tool dispatch, makes no decisions about what to do next. One LLM, curated tools, long-context memory, trusted to drive. Sub-agents on demand. Claude Agent SDK, OpenAI Agents SDK, deepagents, Strands. As models improve, less scaffolding earns its keep.

Visual workflow builder. Drag nodes on a canvas, wire them, the canvas is the program. AI agents are one node type next to HTTP, DB, conditionals. n8n, Make, Zapier. Most automations are mostly deterministic with AI sprinkled in; a canvas matches that.

These aren't exclusive. LangGraph runs inside n8n nodes. An OpenAI Agents SDK agent can be a sub-agent in a CrewAI crew.


one task, three frameworks

Same agent in three frameworks: answer a question by searching the web, then synthesize. LangGraph, Agno, Claude Agent SDK.

LangGraph: you draw the graph

from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from typing import TypedDict
 
class State(TypedDict):
    question: str
    search_results: str
    answer: str
 
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
 
def search(state: State) -> State:
    # call your search tool, store results in state
    results = web_search(state["question"])
    return {"search_results": results}
 
def synthesize(state: State) -> State:
    prompt = f"Answer based on: {state['search_results']}\nQ: {state['question']}"
    answer = llm.invoke(prompt).content
    return {"answer": answer}
 
def should_search(state: State) -> str:
    return "search" if needs_search(state["question"]) else "synthesize"
 
graph = StateGraph(State)
graph.add_node("search", search)
graph.add_node("synthesize", synthesize)
graph.set_conditional_entry_point(should_search)
graph.add_edge("search", "synthesize")
graph.add_edge("synthesize", END)
agent = graph.compile()
 
result = agent.invoke({"question": "What is MCP?"})

You defined the topology. You chose the edges. The LLM fills in the blanks at each node, but it never decides which node to visit next - that's your conditional function. If something goes wrong, you look at the state at each transition and see exactly where it derailed.

Agno: you hand it tools

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.tools.duckduckgo import DuckDuckGoTools
 
agent = Agent(
    model=Claude(id="claude-sonnet-4-20250514"),
    tools=[DuckDuckGoTools()],
    instructions="Answer questions. Search the web if you need current info.",
    markdown=True,
)
 
agent.print_response("What is MCP?")

Six lines. No graph, no state class, no edges. The model decides whether to search. Agno wraps the tool-calling loop and handles the back-and-forth internally. You trade visibility for speed of development.

Claude Agent SDK: you trust the model

import anthropic
 
client = anthropic.Anthropic()
tools = [web_search_tool]  # MCP-compatible tool definitions
 
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="Answer questions. Search if needed. Think step by step.",
    tools=tools,
    messages=[{"role": "user", "content": "What is MCP?"}],
)
 
# The SDK handles the tool-use loop:
# model calls search -> gets results -> synthesizes answer

The model does everything. You provide tools and a system prompt. The SDK runs the tool-calling loop until the model decides it's done. This is the harness philosophy: the model is the control flow.

The same task, three levels of developer control. LangGraph: you hard-code the flow. Agno: you configure the agent and let it loop. Claude SDK: you hand over the keys. Each is the right choice in different contexts. The question is always how much you trust the model and how much you need to debug.

the frameworks, one by one

LangChain

LangChain started as a chain library, became an ecosystem, and is currently rebranding as "the agent engineering platform." The original 2023 LangChain was a giant collection of integrations (every vector store, every LLM, every tool) wrapped in a set of abstractions for chaining LLM calls. It was, in retrospect, too opinionated. The team rewrote it in 2025 to be more streamlined, and the modern langchain package is a much leaner integration layer on top of langgraph's runtime.

The mistake people make about LangChain is treating it as a single thing. Think of it as three layers stacked. At the bottom is langchain-core, which gives you common abstractions for messages, models, tools, and runnables. In the middle is langchain itself, the integration layer with hundreds of model providers, vector stores, and tools. On top sits langgraph for actual agent control flow, langsmith for observability, and langserve for deployment. When people say "LangChain is bloated," they usually mean the integration layer. When people say "LangChain is the most powerful framework," they usually mean the whole stack including LangGraph and LangSmith. Both are true.

It's still the widest integration layer in the field: 100+ models, 50+ vector stores, 200+ APIs, all under one roof. The price is layers between you and the actual prompt. Most senior engineers I know skip the abstractions and hit the model API directly with a thin wrapper.


LangGraph

LangGraph is the part of the LangChain world that production teams actually use. Your agent is a directed graph: nodes are functions (LLM calls, tools, anything), edges are control flow (can be conditional on state), and a typed State object threads through. Cycles allowed. Checkpointing is built in, so you get time-travel debugging, human-in-the-loop pauses, and crash recovery.

It wins because the control flow is explicit. When something breaks in prod, you look at the graph and the state at each transition. With a harness you're reading traces and guessing.

The cost is verbosity. More code than CrewAI or Agno for the same prototype, real learning curve, and as models improve you end up over-engineering structure the model could have figured out. LangChain's answer is deepagents - a harness API on top of LangGraph's runtime. Pick LangGraph if you can only learn one. Model-agnostic, open source.


n8n

n8n is the odd one in the list, because it is not an agent framework. It is a general-purpose workflow automation platform, like Zapier or Make, that has added AI agent nodes. The philosophy is the inverse of code-first frameworks: most real-world automations are deterministic plumbing with AI in a few key places, and a visual builder is the right abstraction for that reality.

Underneath, n8n is a TypeScript app and its AI agent nodes wrap LangChain.js. Drop an "AI Agent" node, wire a chat model, memory, and tools, and you're configuring a LangChain agent through a GUI. n8n's own internal AI Workflow Builder is a LangGraph multi-agent system. The visual layer sits on top of the code-first frameworks, it doesn't replace them.

Use n8n as a hybrid runtime. Deterministic nodes (HTTP, DB, scheduling, branching, error handling) for the boring plumbing, AI nodes for the parts that need a model. Heavy reasoning belongs inside a LangGraph or Agno agent that n8n calls as one node.


Agno

Agno (formerly Phi Data) is the anti-LangChain pitch: pure Python, no graphs, no chains, just agents that run fast.

The Agent class wraps the whole loop in one object. Pass it a model, tools, memory, knowledge sources, storage. Call it. It loops and returns. Multi-agent comes via Teams (route, coordinate, collaborate modes) and Workflows for sequenced runs. Teams nest.

The interesting part is statelessness. Agno claims ~3μs agent instantiation and a tiny footprint by treating agents as throwaway session-scoped objects. State lives in storage (Postgres, SQLite, Mongo, vector stores), not the agent. Spin up, do the work, drop it. Different from LangGraph, where the agent is a stateful graph runtime.

AgentOS exposes agents/teams/workflows as REST endpoints with OpenAPI, sessions, streaming, observability. Agno-Go is the same design in Go for real concurrency.

Pick Agno over LangGraph when you want pure-Python ergonomics, don't need a visual graph, and care about cold-start cost. Pick LangGraph when control flow is genuinely complex, you want it visible, or you need explicit checkpoint durability.


CrewAI

CrewAI codes the "team of specialists" metaphor directly. Define agents with a role ("Senior Researcher") and tools, define tasks, assemble a Crew, run it. The framework handles delegation, context passing, and aggregation.

Fast when your problem fits the metaphor (research, writing, analysis pipelines, content generation). Painful when it doesn't. Recent versions added MCP support. Good for prototyping business agents without learning graph theory.


AutoGen and AG2

AutoGen came out of Microsoft Research, always more research than production. v0.2 introduced agents debating in a multi-turn conversation. The v0.4 rewrite (now AG2) made it event-driven and async-first. The main abstraction is GroupChat: multiple agents share a conversation, a selector function picks who speaks next.

AutoGen's bet is that emergent intelligence comes from dialogue - roles specialize, conversation is the shared state, the selector is the control flow. AutoGen Studio gives you a low-code interface; serious users still write code.

Use it for research-style multi-agent work, coder/critic loops, and anywhere you want maximum orchestration flexibility. Production needs more DIY infrastructure than LangGraph or Agno.


OpenAI Agents SDK

OpenAI's Agents SDK shipped March 2025 as the production replacement for Swarm. The main primitive is the handoff: agents transfer control explicitly, carrying context. Each agent declares instructions, model, tools, and the agents it can hand off to. The runtime does the routing.

Minimalist by design. Compared to LangGraph or CrewAI it gives you only agents, handoffs, tools, guardrails. The thesis is that good models turn small primitives into rich behavior. Tight integration with OpenAI models (other providers supported), tracing and observability out of the box.

Pick it if you're already on OpenAI. Skip it if you need cross-provider portability.


Claude Agent SDK and deepagents

Claude Agent SDK (Anthropic, 2025) and LangChain's deepagents are the clearest harness examples. Don't encode the control flow. Give the agent a strong system prompt, a curated tool set, a filesystem for scratch memory, and a planning structure. Run the LLM in a loop and let it drive. Sub-agents spawn on demand.

Works now because Claude 4 and GPT-5 are good enough at long-horizon planning, tool selection, and self-correction. The framework's job stopped being "prop the model up" and became "stay out of its way." Filesystem-as-memory is the clever trick - reads/writes instead of context stuffing, so the agent operates over arbitrarily long horizons.

Most "bullish on models" position in the field. Right answer when you don't know the control flow up front (open-ended research, code generation, complex debugging). Wrong answer when you do - hard-coding it is cheaper, easier to debug, more reliable.


the rest of the landscape

These frameworks matter but don't need the long-form treatment. Each occupies a clear niche:

FrameworkBackerMental modelSweet spot
StrandsAWSHarnessEnterprise agents inside IAM/Bedrock/CloudWatch fabric
Google ADKGoogleHierarchical treeGemini-native agents with A2A protocol and multimodal input
Pydantic AIPydantic teamHarnessType-strict Python codebases with dependency injection
smolagentsHuggingFaceCode-as-actionAgents that write and execute Python instead of JSON tool calls
LlamaIndexLlamaIndexRAG + agentsRetrieval-heavy problems where getting the right context matters most
Semantic KernelMicrosoftEnterprise harness.NET ecosystem, Azure-integrated deployments

Two things stand out. Smolagents' "agents think in code" approach is the interesting one: instead of producing JSON tool calls, the agent generates Python that runs in a sandbox. Compresses both input tokens (no schemas) and intermediate state (results flow through variables, not context). Anthropic has been writing about the same idea under "code execution with MCP." And LlamaIndex's retrieval primitives are still the best if retrieval is your bottleneck.


the protocol layer underneath all of this

The most important development, more important than any individual framework, is the emergence of protocols that sit beneath the frameworks. The two that matter are MCP and A2A.

The agent stack Three-layer stack: frameworks at the top, integration protocols in the middle, foundation models at the bottom. Frameworks Control flow, orchestration LangGraph Agno n8n Agent SDKs Protocols Tool and agent integration MCP A2A Models Foundation LLMs Claude GPT Gemini Open weight

MCP, the Model Context Protocol, was introduced by Anthropic in November 2024 to solve what they called the N x M integration problem: every model times every tool times every data source equals a custom integration to write. MCP defines a standard JSON-RPC protocol between an MCP client (your AI application) and an MCP server (a tool or data source), so a tool implemented once works with any MCP-compatible client. The original spec defined three primitives (tools, resources, prompts) and has since expanded to five, with sampling and roots added.

The adoption curve was unusually fast for a standard. By March 2025, OpenAI added MCP support to ChatGPT. By Q3 2025, Microsoft shipped MCP servers for GitHub, Azure, and Microsoft 365. By Q1 2026, Google added MCP to Gemini and Vertex AI. In December 2025, Anthropic donated MCP to the Linux Foundation's new Agentic AI Foundation, jointly governed with Block, OpenAI, AWS, Google, and Microsoft, which removed the last political reason for non-Anthropic players to resist it. The public registry now lists thousands of servers. SDK downloads cross 97 million per month. It is, at this point, the integration standard.

No more custom Slack/GitHub/Postgres/Drive integrations. Point your agent at the MCP server. Frameworks compete on control flow, observability, and DX; integration becomes commodity. Same path HTTP took for the web, JDBC for databases.

A2A (Agent-to-Agent) is the complement. MCP lets an agent talk to tools and data; A2A lets agents talk to other agents across framework boundaries. A LangGraph agent invokes a CrewAI agent through A2A's task interface. Earlier in adoption than MCP, trending the same way. Google's ADK ships with native support.

Related: "code execution with MCP" (Anthropic) and how smolagents has worked from day one. When you have hundreds or thousands of tools, dumping all the definitions into context is brutal on tokens. Better: expose tools as code on a virtual filesystem, give the agent a search-tools function and a sandbox, let it pull in only what it needs. The agent writes and runs a small script instead of producing JSON for every call. Compresses both the input (definitions) and the intermediate state (outputs flow through variables, not context). This is where the high end is heading.


where things actually are right now

The harness philosophy is winning. LangChain ships deepagents (harness API on top of LangGraph) while keeping the graph layer for when you need it. As models improve, more orchestration moves into the model. Top production frameworks today: LangGraph, Claude Agent SDK, OpenAI Agents SDK, Strands. CrewAI for role-based. Agno for pure-Python performance.

MCP won. Designing a new agent system? Assume MCP is the integration layer.

Most multi-agent hype was wrong. Most problems people thought needed multi-agent work better as one well-equipped agent with good tools and clear instructions. Multi-agent earns its keep for independent specializations (planner + executor, generator + critic) and parallel exploration. LangChain's own guidance: 80% of apps work better as a single agent.

Single agent vs multi-agent: the 80/20 rule Single Agent (80% of real problems) Agent Search Code Files API Browser DB ✓ Simpler debugging ✓ Lower latency ✓ Cheaper The 80/20 Rule Multi-Agent (when you actually need it) Planner Executor Critic Parallel exploration Generator + critic pairs Different system prompts

"Agent" is fragmenting into useful subcategories. Coding agents (Claude Code, Cursor, OpenHands, Aider), browser agents (Claude in Chrome, browser-use), workflow agents (n8n, Make), research agents (deepagents, Claude's research mode), customer-facing agents (support bots, phone agents). The general-purpose framework matters less when the agent is specialized. And across all categories, observability (LangSmith, Langfuse, Arize) and governance (guardrails, audit logs, human-in-the-loop) have become their own markets.


how to actually choose

You usually don't pick one. Real systems combine: a LangGraph control flow calling MCP tools, embedded inside an n8n workflow, observed by LangSmith or OpenTelemetry. These aren't competing for the same slot; they're filling different ones.

If you must pick one:

  • LangGraph - your control flow is known and you want it explicit.
  • Agno - pure-Python ergonomics, fast iteration, stateless scaling.
  • Claude Agent SDK / deepagents - the problem is open-ended and you trust the model.
  • CrewAI - team-of-specialists fits, you want to ship this week.
  • n8n - mostly deterministic plumbing with AI in a few spots, visual builder helps.
  • OpenAI Agents SDK - already on OpenAI, want the shortest path.
  • AutoGen - skip unless you need multi-agent dialogue or you're doing research.
Which agent framework should you pick? What's your agent problem? Know the control flow in advance? YES NO Mostly deterministic plumbing + some AI? Y n8n N Need explicit state + checkpointing? Y LangGraph N Agno Open-ended task? (research, coding, debug) YES NO Primary model provider? Anthropic OpenAI Multi Claude SDK OpenAI SDK Strands Team-of-specialists metaphor fit? YES CrewAI NO Need multi-agent dialogue? YES AutoGen NO Agno or Claude SDK