Quick Facts
- Category: AI & Machine Learning
- Published: 2026-05-03 23:00:53
- 10 Key Revelations from the Musk vs. Altman Court Battle
- How to Build an AI-Powered Emoji List Generator with GitHub Copilot CLI
- Your Step-by-Step Guide to Easing Knee Arthritis Pain with Aerobic Exercise
- Understanding the U.S. Fertility Decline: A Guide to Economic and Social Drivers
- Turning a PlayStation 5 into a Full Linux Gaming Rig: How It Works and What You Need to Know
Imagine a software team that never sleeps, never takes a coffee break, and can test, triage, and fix bugs across multiple platforms simultaneously. That's exactly what the Docker Coding Agent Sandboxes team has created: a virtual fleet of seven AI agents that work around the clock to ship faster and more reliably. Built on top of their microVM-based sandbox technology, this fleet—known simply as 'the Fleet'—is a game-changer for AI-assisted development. In this article, we break down the seven most important things you need to know about this innovative approach, from how agents are designed to why running locally first makes all the difference.
- What Is the Fleet?
- Claude Code Skills: The Secret Sauce
- Why Agents Need Judgment, Not Just Instructions
- Local First, CI Second: A Game-Changing Workflow
- The /cli-tester Role in Action
- CI as Another Runtime
- The Real-World Impact of the Fleet
1. What Is the Fleet?
The Fleet is a virtual team of seven distinct AI agent roles that work autonomously to support Docker's Coding Agent Sandboxes project (also known as "sbx"). This project provides secure, microVM-based isolation for running powerful coding agents like Claude Code, Gemini, Codex, Docker Agent, and Kiro. Inside each sandbox, agents get full autonomy—their own Docker daemon, network, and filesystem—without ever touching the host system. The Fleet takes this a step further by deploying those agents in CI to test the product, triage issues, post release notes, and even fix bugs. It's like having a dedicated QA team, release manager, and support crew running 24/7 without human intervention.

2. Claude Code Skills: The Secret Sauce
Each agent in the Fleet is powered by something called a Claude Code skill. These aren't traditional scripts; they're markdown files that define a persona, a set of responsibilities, and the tools the agent is allowed to use. Think of a skill as a role description: "You are the build engineer, here's what you know and how to make decisions." This distinction is crucial because agents need more than step-by-step commands—they need the ability to reason and adapt. When a test fails unexpectedly, a script just stops. But a role-based agent investigates, analyzes, and decides on the best next action.
3. Why Agents Need Judgment, Not Just Instructions
The design philosophy behind the Fleet emphasizes judgment over rigid automation. Traditional test scripts are brittle: they assume perfect conditions and break when something unexpected occurs. The Fleet's agents, by contrast, are designed to handle uncertainty. For example, if a build fails due to a flaky network, an agent can retry that step or look up alternative approaches. This flexibility is possible because the skill file provides context and decision-making frameworks rather than a fixed sequence of commands. The result is a system that self-corrects and learns, making it far more resilient than any classic CI pipeline.
4. Local First, CI Second: A Game-Changing Workflow
One of the Fleet's core principles is "local first, CI second." Every skill starts its life on a developer's laptop before ever touching a GitHub workflow. The team didn't write the /cli-tester skill (the exploratory tester) by drafting a YAML file—they ran it locally, watched it build binaries, exercise CLI commands, and report issues. They tweaked the skill file until it performed flawlessly in their terminal. Only then did they wire it into CI. This approach turns a 10-minute commit-push-wait cycle into a 2-second local iteration. You see the agent think, spot confusion instantly, and fix the skill file on the fly.

5. The /cli-tester Role in Action
The /cli-tester agent is the Fleet's exploratory testing specialist. Its job is to exercise the sbx CLI across multiple platforms (macOS, Linux, Windows) and catch regressions that automated unit tests might miss. It runs nightly in CI, but the key is that it's the same exact skill used locally. When the agent encounters an unexpected behavior—say a sandbox fails to start on Windows—it doesn't just report a failure; it captures logs, diagnoses potential causes, and files a detailed issue with reproduction steps. This turns what would be a manual bug-hunting session into a fully automated process.
6. CI as Another Runtime
A radical idea behind the Fleet is that CI is just another runtime for the same skill. There is no separate "CI version" of an agent, no translation layer, no custom workflow logic that duplicates effort. The /cli-tester that runs nightly on three operating systems is the exact same markdown skill file that a developer runs in their terminal. The CI workflow simply sets up the environment, checks out the code, and calls the skill. This consistency means that a fix tested locally works instantly in CI, eliminating the most frustrating debugging cycles in continuous integration.
7. The Real-World Impact of the Fleet
Since deploying the Fleet, the Docker team has seen dramatic improvements in shipping velocity and product stability. Agents now handle repetitive tasks like release note generation, issue triage, and regression testing—freeing human engineers to focus on high-value features. The autonomous bug-fixing capability means that minor issues are often patched before a developer even wakes up. This isn't just about automation; it's about creating a self-improving system where agents learn from each run. The future holds even more potential, with plans to expand the fleet to cover documentation, security audits, and architectural reviews.
The Fleet demonstrates that the next frontier of AI in software development isn't just about writing code—it's about building autonomous teams that collaborate with humans to deliver better software, faster. By combining sandbox isolation with role-based agents that run locally first and then scale to CI, Docker has created a blueprint that any team can adapt. Whether you're a solo developer or part of a large organization, the lessons from the Fleet offer a powerful way to rethink what's possible with AI-assisted development.