Harness Engineering
The engineering of the wrapper around the model — prompt structure, tool design, hooks, MCP, context windows, eval loops, and the feedback systems that make agents reliable.
Reading
- Building Claude Code with Boris Cherny
- Harness engineering for coding agent users
- Skill Issue: Harness Engineering for Coding Agents
- Harness Engineering for AI Coding Agents: Constraints That Ship Reliable Code
- Mitchell Hashimoto’s new way of writing code
- Harness engineering: leveraging Codex in an agent-first world
- My AI Adoption Journey
- Introducing Agent Skills
- Software 3.0: Software in the Age of AI
- Don't Build Multi-Agents
- Building effective agents
- Introducing the Model Context Protocol
- Your AI Product Needs Evals
- LLM Powered Autonomous Agents
- Toolformer: Language Models Can Teach Themselves to Use Tools
- ReAct: Synergizing Reasoning and Acting in Language Models
Output
-
Hardening AI Agents Against the 'Lethal Trifecta'
Personal AI assistants like Openclaw are fantastically powerful, and quite dangerous. Here's how to harden a personal assistant without making it useless.
Synthesis
A working notebook on the discipline of harness engineering — the wrapper around the model rather than the model itself. The argument I’m tracking: that the harness defines the productivity ceiling more than the underlying weights. Mitchell Hashimoto crystallised the term in February 2026; within a week OpenAI and Anthropic had published their own treatments, and within two months Martin Fowler’s site had a full-length canonical article on it. The pattern matters: a vocabulary moved from one practitioner’s habit to industry consensus in eight weeks.
Threads to follow:
- Agent = Model + Harness. The simplest formulation, from Hashimoto. Most discussion of “AI productivity” is really discussion of harness quality.
- Context as substrate. The shift from prompt engineering (one-shot wording) to context engineering (the whole information environment the agent operates inside).
- MCP as the neutral protocol. Tools were the bottleneck; MCP made them composable.
- Single agent vs multi-agent. Cognition’s “Don’t Build Multi-Agents” paired with Anthropic’s research-system writeup is the cleanest disagreement in the field — same data, opposite conclusions.
- Evals as steering. Hamel Husain’s argument that without evals you cannot drive the system, only watch it move.