OpenAI's Harness Engineering Experiment: Zero Manually-Written Code
Based on OpenAI's 'Harness engineering: leveraging Codex in an agent-first world' and Rohit's harness framework
Originally published on Medium.
Based on OpenAI's "Harness engineering: leveraging Codex in an agent-first world" and Rohit's harness framework
OpenAI harness engineering: AI agents generating pull requests with zero human-written code, progressive disclosure architecture
What if you could ship over a million lines of code without writing a single line yourself? Discover how OpenAIβs groundbreaking βzero manual codeβ experiment transformed engineers into harness architects, revolutionizing the approach to software development! π #HarnessEngineering #AIEngineering
OpenAI harness engineering
OpenAI harness engineering: Evolution of an Engineer
OpenAI harness engineering: Engineering roles
-
Before (Engineers): Engineers define the core system shape: major components, data flows, reliability goals, and how the system meets product requirements.
-
After (Engineers): Still engineer-led, but with extra emphasis on designing for agent execution: clear module boundaries, explicit interfaces, and βagent-friendlyβ specs that can be mechanically verified.
-
Before (Engineers): Engineers decide patterns and constraints: service boundaries, dependency direction, persistence choices, performance budgets, and security posture.
-
After (Engineers): Still engineer-led, but more decisions are encoded as enforceable rules: repo documentation, lint rules, contract tests, and scaffolding that prevents agents from drifting.
-
Before (Engineers): Engineers implement features and fixes directly: creating files, editing functions, wiring dependencies, and making incremental changes.
-
After (Agents): Agents generate implementation code from specs and repo context. Engineers focus on prompts, task decomposition, harness updates, and ensuring the repo contains the βsource of truthβ the agents need.
-
Before (Engineers): Engineers review logic, style, correctness, and maintainability, and then request changes.
-
After (Engineers): Engineers still review, but review shifts toward verifying intent, spotting systemic failure modes, and improving the harness. Many βstyle and conventionβ concerns are pushed into automated checks and remediation instructions.
-
Before (Engineers): Engineers reproduce issues, inspect logs and traces, isolate root causes, and patch code.
-
After (Engineers + Agents): Engineers and agents collaborate. Agents run tests, read stack traces, and iterate on fixes. Engineers handle ambiguous failures, design-level issues, and harness gaps that cause repeated agent mistakes.
-
Before (Engineers): Engineers author unit, integration, and end-to-end tests, decide what to test, and maintain suites over time.
-
After (Agents): Agents generate and update tests, especially for well-specified behavior and regression coverage. Engineers set the testing strategy, coverage thresholds, and contract boundaries that tests must enforce.
-
Before (Engineers): Engineers restructure code for readability, performance, and maintainability, including migrations and large-scale changes.
-
After (Agents): Agents perform much of the mechanical refactor work under constraints. Engineers define the target architecture, write migration plans, ensure safety rails exist (tests, linters, CI), and review for correctness and risk.
-
Agents handled genuine complexity, not toy tasks
-
The code passed human review (because humans still reviewed PRs)
-
The system actually ran and shipped, not just compiled
-
Quality was maintained at a level acceptable for production use
OpenAI harness engineering: Production Outputs

- Foundational (monthly): Architecture decisions, module boundaries, coding standards, naming conventions
- Operational (weekly): API contracts, data schemas, deployment configs, environment setup
- Tactical (daily): Task-specific context, current sprint goals, known issues, workarounds
OpenAI harness engineering: The harness
repo/
βββ BRIEF.md
# Map file: what this repo does, where things live
βββ ARCHITECTURE.md
# High-level system design
βββ modules/
β βββ auth/
β β βββ README.md
# Module-level context
β β βββ CONVENTIONS.md
# Module-specific patterns
β β βββ src/
β β βββ tests/
β βββ billing/
β β βββ README.md
β β βββ CONVENTIONS.md
β β βββ src/
β β βββ tests/
β βββ notifications/
β βββ README.md
β βββ CONVENTIONS.md
β βββ src/
β βββ tests/
βββ .linters/
βββ architecture_rules.py
βββ naming_conventions.py
OpenAI harness engineering: Repository as truth
# Project Brief
This is a billing platform with three core modules.
## Modules
-
`modules/auth/`
- Authentication and authorization. OAuth2 + API keys.
-
`modules/billing/`
- Subscription management, invoicing, payment processing.
-
`modules/notifications/`
- Email, SMS, and webhook notifications.
## Key Conventions
-
All API endpoints follow REST conventions in ARCHITECTURE.md
-
Each module has its own CONVENTIONS.md for module-specific patterns
-
Tests live alongside source code in each module's tests/ directory
## Current State
-
Auth module: stable, v2.1
-
Billing module: active development, migrating from Stripe v1 to v2
-
Notifications module: stable, v1.4
OpenAI harness engineering - Progessive Disclosure Architeture PDA
OpenAI harness engineering - Application Legibility
OpenAI harness engineering: Mechanical Enforcement
ERROR: Naming convention violation
File: modules/billing/
src/invoice_handler.py
Line
:
47
Found
:
`processInvoice`
Expected
pattern
: snake_case function names
Fix
: Rename to
`process_invoice`
Reference
: See modules/billing/CONVENTIONS.md#naming
-
What it checks: Function, class, variable names match module patterns
-
Agent response: Auto-rename
-
What it checks: No cross-module imports that violate dependency rules
-
Agent response: Restructure imports
-
What it checks: Endpoints match OpenAPI spec, request/response schemas validated
-
Agent response: Fix schemas
-
What it checks: New code must include tests, minimum coverage per module
-
Agent response: Generate missing tests
OpenAI harness engineering: Automated Enforcement
OpenAI harness engineering: Dependency Stack requires constraints
OpenAI harness engineering: Througput Merge Philosophy
OpenAI harness engineering: Harness Engineerβs Clock
OpenAI harness engineering: Activities
- Context Engineering: Repository-as-truth + brief map files
- Architectural Constraints: Custom linters with remediation instructions
- Garbage Collection: Post-merge monitoring + agent auto-fix
- Feedback Loops: Linter iteration cycles + observability stack
- Observability: Full traces, structured logs, metrics dashboards
OpenAI harness engineering: How does this map to Rohitβs Harness Framework
- If your team adopted zero-manual-code tomorrow, which of the five harness engineering patterns would be the hardest to implement, and why?
- What knowledge currently lives outside your repository that would break agents if they had to rely only on your codebase?
- Does moving quality enforcement from human gates to mechanical linters raise the floor, lower the ceiling, or both, for code quality in your organization?
OpenAI harness engineering: Moving from coding to engineering factories
- 2026β04β04 β OpenAIβs Harness Engineering Experiment: Zero Manually-Written Code
- 2026β03β29 β Anthropicβs Harness Engineering: Two Agents, One Feature List, Zero Context Overflow
- 2026β03β27 β The $9 Disaster: What Anthropicβs Harness Design Paper Teaches Us About Building Autonomous AI Applications
- 2026β03β20 β LangChainβs Harness Engineering: From Top 30 to Top 5 on Terminal Bench 2.0
- 2026β03β19 β Beyond the AI Coding Hangover: How Harness Engineering Prevents the Next Outage
- 2026β03β18 β LangChain Deep Agents: Harness and Context Engineering: Memory, Skills, and Security
- 2026β03β17 β Harness Engineering vs Context Engineering: The Model is the CPU, the Harness is the OS