AI Systems Deep Dive

Multi-Agent
Architectures

What it takes to run a fleet in production, not just demo one.

June 11, 2026 Christopher Grant · Nebari 25 min + live Q&A
Nebari · AI Systems Deep Dive01
The gap

Spinning one up is solved. Running a fleet is the discipline.

One agent is an afternoon. The day after the demo you own ten of them, on real data, for real users, and somebody has to run them.

Nebari · AI Systems Deep Dive02
Three ways people build agents

The first two get you to a demo. Neither survives production.

Naive and blackbox both stop at the demo. The wall is everything after: versions, permissions, traces. No builder does that part for you.

Nebari · AI Systems Deep Dive03
The reframe

A demo answers one question. A fleet answers four.

A demo proves the task can be done once. Answer these four per role and most of the architecture falls out on its own.

Nebari · AI Systems Deep Dive04
One example, all the way through

The EA: a daily snapshot every employee leans on

This one runs in production today. Same job for fifty people, each with private email, calendar, and files the others can never see.

Nebari · AI Systems Deep Dive05
Identity · what an agent is made of

The prompt is the task. The agent is everything around it.

The inbound prompt is one input among several: a task. Six parts decide what happens to it, and every one is config you own.

Nebari · AI Systems Deep Dive06
Identity · the role is a spec

The whole agent is a spec. Fat core, thin roles.

The role is a folder: skills, scripts, soul, seed memory. The yaml is the spec that wires it together. Diff it, version it, audit it, clone it.

roles/ea_tess/role.yaml
role: executive_assistant
autonomy_band: moderate
model_tiers:
  classify: cheap
  draft: workhorse
  judge: frontier
permissions:
  write_autonomous:
    [create_draft, label_or_file_email, ...]
  write_requires_approval:
    [send_email, schedule_or_move_meeting, ...]
budgets:
  max_daily_cost_usd: 25
  per_run_ceiling_usd: 1.50
Nebari · AI Systems Deep Dive07
Identity

Agents are identities. Treat one like an employee.

Give it a real account: its own email, its own scoped tokens, its own rows in the audit log. Firing it means revoking tokens, same as offboarding a person.

Nebari · AI Systems Deep Dive08
The picture

One role. Many users. One runtime.

One role file, bound per user at run time to their tokens, data, and memory. Fifty employees is one codebase and fifty config rows.

Nebari · AI Systems Deep Dive09
Triggers · how the work arrives

Four ways in. One role, never forked.

A trigger is how the work arrives: a message, a cron row, a webhook, another agent. All four enqueue the same job shape against the same role.

Nebari · AI Systems Deep Dive10
Composition

More agents is not more better.

Planner, coder, tester: those could be skills inside one loop. A worker earns its own agent when it needs a fresh context window or its own credentials.

Nebari · AI Systems Deep Dive11
Coordination

The handoff is the architecture.

Make it a typed payload: the intent, the durable context B needs to act, a compressed summary of the rest. Everything else gets dropped on purpose.

Nebari · AI Systems Deep Dive12
Tools & permissions

Two layers. The model never decides what it's allowed to do.

The binding is data: which tokens this instance holds. The policy is code: which actions run alone and which wait in the approval queue. Neither lives in the prompt.

Nebari · AI Systems Deep Dive13
Observability

If you can't replay it, it's a slot machine.

Instrument the runtime once and every role inherits it. Each run writes one trace: inputs, model, tool calls, cost, outcome. Replay reruns a failed step with the same config and context.

Nebari · AI Systems Deep Dive14
The closer

Most tasks should never reach a model.

A deterministic router tries the cheap rungs first. A rule that works costs nothing. The loop only sees what nothing cheaper could handle.

Nebari · AI Systems Deep Dive15

The operating model is the product.

i · Identity

Who is it

Config on a shared runtime. Its own account and audit trail.

ii · Triggers

How it runs

Chat, schedule, hook, agent. Role × trigger × task.

iii · Tools & permissions

What it can do

Identity binds what it sees. Policy binds what it does.

iv · Observability

What it did

Every run recorded, or it's a slot machine.

One agent is a prompt. A fleet is an operating model. Get those four right and it scales. Get them wrong and you have fifty demos.

Nebari · AI Systems Deep Dive16
Your turn

Let's compare notes.

Questions are welcome. Stories are better.

01

What have you tried?

Single loop, swarm, a builder, your own runtime.

02

What's been working?

The thing you'd ship again tomorrow.

03

Where did it bite you?

The failure that taught you something.

04

What was new here?

Anything you'll go change this week.

Talk to meChristopher Grant · cgrant@nebari.cc
The deckmulti-agent-architectures.pages.dev
Next sessionAI Without the Hype · June 19
ReplayOn the Luma event page
Nebari · AI Systems Deep Dive17
01 / 16
→ next · ← back · N notes · F full · 1-9 jump
Presenter notes