OpenAI's Harness Engineering Experiment: Zero Manually-Written Code

Based on OpenAI's 'Harness engineering: leveraging Codex in an agent-first world' and Rohit's harness framework

Rick Hightower

Originally published on Medium.

Based on OpenAI's "Harness engineering: leveraging Codex in an agent-first world" and Rohit's harness framework

OpenAI harness engineering: AI agents generating pull requests with zero human-written code, progressive disclosure architecture OpenAI harness engineering: AI agents generating pull requests with zero human-written code, progressive disclosure architecture

What if you could ship over a million lines of code without writing a single line yourself? Discover how OpenAI’s groundbreaking β€œzero manual code” experiment transformed engineers into harness architects, revolutionizing the approach to software development! πŸš€ #HarnessEngineering #AIEngineering

OpenAI harness engineering OpenAI harness engineering

OpenAI harness engineering: Evolution of an Engineer OpenAI harness engineering: Evolution of an Engineer

OpenAI harness engineering: Engineering roles OpenAI harness engineering: Engineering roles

  • Before (Engineers): Engineers define the core system shape: major components, data flows, reliability goals, and how the system meets product requirements.

  • After (Engineers): Still engineer-led, but with extra emphasis on designing for agent execution: clear module boundaries, explicit interfaces, and β€œagent-friendly” specs that can be mechanically verified.

  • Before (Engineers): Engineers decide patterns and constraints: service boundaries, dependency direction, persistence choices, performance budgets, and security posture.

  • After (Engineers): Still engineer-led, but more decisions are encoded as enforceable rules: repo documentation, lint rules, contract tests, and scaffolding that prevents agents from drifting.

  • Before (Engineers): Engineers implement features and fixes directly: creating files, editing functions, wiring dependencies, and making incremental changes.

  • After (Agents): Agents generate implementation code from specs and repo context. Engineers focus on prompts, task decomposition, harness updates, and ensuring the repo contains the β€œsource of truth” the agents need.

  • Before (Engineers): Engineers review logic, style, correctness, and maintainability, and then request changes.

  • After (Engineers): Engineers still review, but review shifts toward verifying intent, spotting systemic failure modes, and improving the harness. Many β€œstyle and convention” concerns are pushed into automated checks and remediation instructions.

  • Before (Engineers): Engineers reproduce issues, inspect logs and traces, isolate root causes, and patch code.

  • After (Engineers + Agents): Engineers and agents collaborate. Agents run tests, read stack traces, and iterate on fixes. Engineers handle ambiguous failures, design-level issues, and harness gaps that cause repeated agent mistakes.

  • Before (Engineers): Engineers author unit, integration, and end-to-end tests, decide what to test, and maintain suites over time.

  • After (Agents): Agents generate and update tests, especially for well-specified behavior and regression coverage. Engineers set the testing strategy, coverage thresholds, and contract boundaries that tests must enforce.

  • Before (Engineers): Engineers restructure code for readability, performance, and maintainability, including migrations and large-scale changes.

  • After (Agents): Agents perform much of the mechanical refactor work under constraints. Engineers define the target architecture, write migration plans, ensure safety rails exist (tests, linters, CI), and review for correctness and risk.

  • Agents handled genuine complexity, not toy tasks

  • The code passed human review (because humans still reviewed PRs)

  • The system actually ran and shipped, not just compiled

  • Quality was maintained at a level acceptable for production use

OpenAI harness engineering: Production Outputs OpenAI harness engineering: Production Outputs

OpenAI's Harness Engineering Experiment: Zero Manually-Written Code

  • Foundational (monthly): Architecture decisions, module boundaries, coding standards, naming conventions
  • Operational (weekly): API contracts, data schemas, deployment configs, environment setup
  • Tactical (daily): Task-specific context, current sprint goals, known issues, workarounds

OpenAI harness engineering: The harness OpenAI harness engineering: The harness

repo/
β”œβ”€β”€ BRIEF.md                    
# Map file: what this repo does, where things live
β”œβ”€β”€ ARCHITECTURE.md             
# High-level system design
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ auth/
β”‚   β”‚   β”œβ”€β”€ README.md           
# Module-level context
β”‚   β”‚   β”œβ”€β”€ CONVENTIONS.md      
# Module-specific patterns
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── tests/
β”‚   β”œβ”€β”€ billing/
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ CONVENTIONS.md
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── tests/
β”‚   └── notifications/
β”‚       β”œβ”€β”€ README.md
β”‚       β”œβ”€β”€ CONVENTIONS.md
β”‚       β”œβ”€β”€ src/
β”‚       └── tests/
└── .linters/
    β”œβ”€β”€ architecture_rules.py
    └── naming_conventions.py

OpenAI harness engineering: Repository as truth OpenAI harness engineering: Repository as truth

# Project Brief
This is a billing platform with three core modules.
## Modules
-
 
`modules/auth/`
 - Authentication and authorization. OAuth2 + API keys.
-
 
`modules/billing/`
 - Subscription management, invoicing, payment processing.
-
 
`modules/notifications/`
 - Email, SMS, and webhook notifications.
## Key Conventions
-
 All API endpoints follow REST conventions in ARCHITECTURE.md
-
 Each module has its own CONVENTIONS.md for module-specific patterns
-
 Tests live alongside source code in each module's tests/ directory
## Current State
-
 Auth module: stable, v2.1
-
 Billing module: active development, migrating from Stripe v1 to v2
-
 Notifications module: stable, v1.4

OpenAI harness engineering - Progessive Disclosure Architeture PDA OpenAI harness engineering - Progessive Disclosure Architeture PDA

OpenAI harness engineering - Application Legibility OpenAI harness engineering - Application Legibility

OpenAI harness engineering: Mechanical Enforcement OpenAI harness engineering: Mechanical Enforcement

ERROR: Naming convention violation
  File: modules/billing/
src/invoice_handler.py
  Line
: 
47
  
Found
: 
`processInvoice`
  Expected 
pattern
: snake_case function names
  
Fix
: Rename to 
`process_invoice`
  
Reference
: See modules/billing/CONVENTIONS.md#naming
  • What it checks: Function, class, variable names match module patterns

  • Agent response: Auto-rename

  • What it checks: No cross-module imports that violate dependency rules

  • Agent response: Restructure imports

  • What it checks: Endpoints match OpenAPI spec, request/response schemas validated

  • Agent response: Fix schemas

  • What it checks: New code must include tests, minimum coverage per module

  • Agent response: Generate missing tests

OpenAI harness engineering: Automated Enforcement OpenAI harness engineering: Automated Enforcement

OpenAI harness engineering: Dependency Stack requires constraints OpenAI harness engineering: Dependency Stack requires constraints

OpenAI harness engineering: Througput Merge Philosophy OpenAI harness engineering: Througput Merge Philosophy

OpenAI harness engineering: Harness Engineer’s Clock OpenAI harness engineering: Harness Engineer’s Clock

OpenAI harness engineering: Activities OpenAI harness engineering: Activities

  • Context Engineering: Repository-as-truth + brief map files
  • Architectural Constraints: Custom linters with remediation instructions
  • Garbage Collection: Post-merge monitoring + agent auto-fix
  • Feedback Loops: Linter iteration cycles + observability stack
  • Observability: Full traces, structured logs, metrics dashboards

OpenAI harness engineering: How does this map to Rohit’s Harness Framework OpenAI harness engineering: How does this map to Rohit’s Harness Framework

  • If your team adopted zero-manual-code tomorrow, which of the five harness engineering patterns would be the hardest to implement, and why?
  • What knowledge currently lives outside your repository that would break agents if they had to rely only on your codebase?
  • Does moving quality enforcement from human gates to mechanical linters raise the floor, lower the ceiling, or both, for code quality in your organization?

OpenAI harness engineering: Moving from coding to engineering factories OpenAI harness engineering: Moving from coding to engineering factories

#OpenAI #harness engineering #Codex #agent-first development #zero manual code #AI code generation