The industry is currently obsessed with Copilots. And credit where it is due: Microsoft normalised the idea of embedding AI into our workflows just by coining the term — a critical step to bridge the uncanny valley.
The premise is good: keep the human in charge, but make them productive without taking away their agency. And it works in specific contexts — the first draft of an email, the first version of a pitch deck, the contour of a strategy doc. Better than a blank page.
But a Copilot is not a system solution. It is personal productivity bolted onto the enterprise and given a new name. And at enterprise scale, it does not reduce work. It moves it somewhere more expensive.
At its core, a Copilot is just personal productivity bolted onto the enterprise, pretending to be a system solution. If you strip away the branding, you might as well provide the raw inputs to your personal GPT and achieve the same result. For Google and Microsoft, these are excellent features to upsell without disrupting their legacy suites.
Beyond the personal hacks, Copilots don't eliminate work, nor do they fundamentally change how work gets done (refer to Man's Search for Information, where we discussed bolting 'Push' tech onto 'Pull' workflows). In fact, in complex enterprise workflows, they often do the opposite. The work simply moves from Execution to Expensive Supervision.
The Supervision Burden
Human productivity relies on being in a state of flow - the ability to act intuitively, continuously, and without interruption. Copilots break this rhythm.
Look at the website bots, CX agents, or Voice AI deployments we see today. On the surface, they deflect 60-70% of inquiries, and Leadership marks this as a win for cost savings.
However, the operational reality tells a different story.
You have stopped front-ending the customers, but now you are back-ending the AI.
- You have to grind it out to evaluate quality.
- You have to provide constant feedback to AI providers.
- Your teams spend hours reviewing transcripts to ensure the AI didn't hallucinate a policy or insult a user.
In the best-case scenario, you solve the resource augmentation problem. But often, you end up with a Checklist AI - it exists to say you have it, but the experience is underwhelming compared to the intelligence users now expect from their personal AI tools.
This is the Supervision Burden.
You are constantly supervising a probabilistic machine inside a system designed for deterministic compliance. And unlike a junior employee who learns and requires less supervision over time, a Copilot requires constant supervision for every single interaction. In addition to being responsible for all the KPIs, you have a new micromanagement task to ensure it doesn't blow up.
The Trap of Local Optimization
Copilots fail at scale because they optimize for Local Cognition (responding to a query) rather than System Outcomes (resolving the user's problem).
Copilots assume that:
- Partial automation is always helpful.
- Humans can continuously supervise AI without cost.
- Productivity is additive at the task level.
In reality:
- Supervision fragments attention.
- Cognitive switching compounds fatigue.
- Local efficiency does not translate to system-level outcomes.
This is manageable at a small scale. However, at an enterprise scale, it becomes an invisible pain - a friction that leadership often misses because they are looking at Deflection Rates rather than Resolution Effort. By the time it appears in subpar CSAT scores, the damage is already done.
The Agentic Pivot
The industry realizes this, hence the recent buzz about Agents. The promise is to move from helping to doing.
But there is a condition that is not talked about enough: Agents only work when you have a verifiable handoff protocol.
Look at software engineering. It is the only sector where AI Agents (like Cursor or Claude Code) are truly delivering on the promise. Why?
Because developers have a pre-existing architecture to build trust: The Diff.
- The Agent writes code across multiple files.
- The Human does not watch it type.
- The Human reviews the Diff (the exact change) in a Pull Request - comparing what it produced vs. the prior version.
- The test suite runs automatically to verify logic.
The Unit of Work is bounded, the Context Graph is available to the devs, and the verification is deterministic. The feedback loop is structured, allowing specific instructions to ground the AI and course-correct deviations.
Context Graph is available.
Verification is deterministic.
No test suite for judgment calls.
Supervision is the only option.
The Enterprise problem is that we don't have Diffs for business.
Nor have we instrumented workflows to create these feedback loops. We don't have a structured interface to review a negotiation strategy or a complex CX decision without reading the whole transcript. The workflows were defined for humans doing the work, not humans auditing the work.
The Architectural Fix
Until we build new, AI-native Human-in-the-Loop workflows, Enterprise Agents will just be black boxes that require constant supervision.
The solution requires carving out complete atomic units of work and building the verification layer. And at its core, the primary task is Reasoning.
We need to build Glass Box interfaces where the AI exposes its Reasoning Trace (Explainable AI) and proposes an action.
- Don't: Build a Copilot that helps an agent type a response (Supervision Burden).
- Do: Build an Agent that reads the ticket, validates the policy, drafts the refund, and queues it for a binary "Approve/Reject" decision (Outcome Ownership).
The human role shifts from Editor to Approver.
They verify the context and the reasoning. If the reasoning is sound, they approve. This creates a feedback loop that actually trains the system, allowing you to move from Unknown Unknowns (I don't know what this AI can do) to Known Knowns (I trust this AI for this specific task).
And this difference is architectural.
- Copilots live inside the task. They force the human to supervise the process.
- Agents own the task. They allow the human to approve the outcome.
We must move from Assistance to Handoffs.
The revolution of Generative AI is the reasoning capacity before it becomes synthesized output. But that revolution only works if you build the Glass Box first — so the system can show its reasoning, the human can correct it, and the Trust Budget can be earned rather than assumed.
Published on January 28, 2026
← Back to The Agentic Manifesto