Skip to main content

Overview

Spine Swarm uses a four-tier agent architecture. Each tier has a specific role, and the tiers work together to handle complex projects that no single agent could manage alone.

L1: The Orchestrator

L1 is the user-facing agent. It operates in the chat panel and is the only tier the user talks to directly. L1’s job is to understand what the user wants, plan the approach with the user, collect initial context, and dispatch work to L2. L1 is also responsible for presenting finished results back to the user and handling follow-up requests. L1 does not operate on the canvas. It does not create blocks or make connections. It plans and dispatches. Because L1 is not coupled to the canvas execution, it can dispatch multiple independent tasks to separate L2 agents that run in parallel. L1 also maintains a persistent understanding of the full project scope, which means follow-up requests (“Now do the same analysis for Europe”) do not require re-explaining the project.

L2: The Strategist

L2 receives a task from L1 and decomposes it into a structured execution plan. L2 identifies what subtasks are required, what the dependencies between them are, which can run in parallel and which are sequential, what kinds of blocks each subtask will need, what models to use for each block, and what the expected outputs are. L2 agents can read the canvas, search across blocks and canvases, do web searches directly, ask the user for clarification via L1, and use connector tools (integrations) to access external services. L2 does not create blocks directly — it delegates to L2.5 persona agents. When the L2.5 agents complete, L2 reads their outputs and synthesizes the final response.

L2.5: Persona Agents

L2.5 agents are specialized persona agents, each assigned a specific persona and goal by the L2 that created them. A persona agent might be a researcher, a competitive analyst, a financial modeler, a writer, a reviewer. Each one operates on the canvas using blocks (L3 primitives) to do its work. Independent L2.5 agents run in parallel. Dependent ones wait on the outputs they need before starting. L2.5 agents also validate each other’s work — reviewer personas check outputs before they are passed downstream, catching reasoning errors before they compound across a long execution chain. Like L2, persona agents can search the canvas, read block content, do web searches directly, ask the user for clarification, and use connector tools. The key difference is that L2.5 agents also delegate block creation to L3 sub-agents. A persona agent might do a web search to find relevant URLs, then spin up an L3 agent to create a BrowserUse block for each URL, run them in parallel, read the results, and use those as input for the next step.

L3: The Block Operators

L3 is the lowest tier. L3 agents handle individual block operations: creating a block, running it, reading its output, editing it, connecting it to other blocks. Each L3 call does one thing. L3 agents are orchestrated by L2.5 persona agents. Specific L3 capabilities include:
  • Creating blocks of any type with full configuration
  • Running blocks and waiting for output
  • Reading block output
  • Editing existing blocks to refine or fix content
  • Resolving model names to exact IDs and looking up which models are available for a given block type
  • Reading canvas context and specific block content
  • Bulk expanding list items into individual blocks (take a list of 10 topics and automatically create a research block for each one)
L3 creates one block per call, but L2.5 can dispatch many L3 sub-agents in parallel.

How the Tiers Work Together

A typical flow: the user describes a project to L1. L1 clarifies scope and dispatches it as a task. L2 decomposes the task into subtasks (research, analysis, synthesis, deliverable creation). L2.5 persona agents pick up each subtask and operate on the canvas via L3 actions. Independent personas run simultaneously. Downstream personas wait for upstream context. The canvas persists everything, so any agent at any tier can read any artifact that has been produced. L2 synthesizes the final output. L1 presents it to the user and handles follow-up.

Model Selection

Rather than relying on a single model, Spine dynamically selects the best model (or ensemble of models) from a pool of 300+ for each task. Agents pick the best model for each block and sometimes run the same block with multiple models to compare and synthesize outputs. The user does not configure any of this. This matters because no single model is best at everything. Some are better at reasoning, some at web search synthesis, some at creative writing, some at structured data extraction. Spine routes to the right tool for each step and ensembles multiple models to override any single model’s biases or blind spots.

Key Infrastructure

  • Dependency graph execution — Under the hood there is a full dependency graph (DAG) that determines execution order, parallelism, and concurrency automatically. Agents do not need to manage this — they just operate on blocks
  • Context propagation — Handles the wide variance in context sizes and formats across different block types. A web page block can feed into a prompt block can feed into a table block can feed into a slides block
  • Structured handoffs — Agents leave explicit, structured summaries designed to be consumed efficiently by the next agent. This is a protocol for maintaining context fidelity across long chains of agent work
  • BrowserUse at scale — Multiple browser sessions running in parallel, interacting with the real web (including web archives), feeding information back into the canvas
  • Folder synthesis (map-reduce) — Run a prompt against every file in a folder and aggregate the results. Scales to large document collections in a way that dumping everything into a single context window does not