Mastering Agent Development: The Architect Agent Workflow for Creating Robust AI Agent Skills

Rick Hightower

Architect figure at holographic command center orchestrating multiple AI coding agents across floating workspace screens with dramatic lighting Architect figure at holographic command center orchestrating multiple AI coding agents across floating workspace screens with dramatic lighting

How frustration with production debugging led to a year-long evolution in multi-agent AI workflows

The Problem: Lost in the Fog of War

If you've ever actually used an AI coding assistant, you know the deal. One minute, they're a genius. The next, they're just dangerously unpredictable.

Sometimes it's just a feeling of powerlessness.

I remember debugging a upper level environment GCP issue, watching an AI coding assistant modify scripts while I held my breath, hoping it didn't torch everything. You see it make a change you didn't ask for and then try to follow the logic. Oh, it found something that I missed quickly becomes what it is doing now. Should I stop it or let it finish?

I struggled to ingest it all, to get the playback needed for the human-in-the-loop that often catches mistakes before they become disasters. Great, it fixed it. What did it fix? I now need to make the same fix across different environments and backfill it in a few IaC repos. It works, but how? And is the change it made appropriate for our security posture, and are the other IaC repos now out of date?

Ok, let me ask: what did it do? How did you fix it? Please tell me so I can capture it, fix it in the IaC repos, and make the same change in the other environments. But oh wait, it's in a new context now with no memory of those changes. I must start from scratch. These are multi-repo changes, and the AI has no awareness of how changes in one repo should propagate to others.

That's the fog of war when working with AI coding assistants.

Workflow overview diagram

What If We Could Wrap a System Around the AI?

So what's the solution here? How do we stop just crossing our fingers and hoping for the best?

This project poses a pretty simple question: What if we could wrap a whole system around the AI? Something that acts like a clear blueprint, a project manager, and a quality inspector, all rolled into one, keeping a human firmly in the driver's seat.

Meet the Architect Agent. It's a framework designed from the ground up to put a professional, human-centric structure on AI development. The goal is to turn that AI from a loose cannon into a real, dependable collaborator.

The Construction Site Analogy

The easiest way to wrap your head around this is to think about a construction site.

The Architect Agent is your building architect. They're the one with the detailed blueprints, defining all the materials and standards. They don't swing hammers or pour concrete. They plan, review, and ensure that everything complies with code. They work with the building inspectors and then with the crews to make sure everything is up to snuff.

The Code Agent is the construction crew on the ground. They don't design anything. They just follow those blueprints to the letter, executing the actual work with precision. They are smart, able to work around problems, and given latitude. But when they have to deviate from the plan, the architect with the broader view can come in, review their work against the overall plan, and ensure it all fits.

Two totally different jobs, but they're working together to deliver a single high-quality result. The architect never compromises on standards. The construction crew never freelances a design change. That's the relationship we're building between the Architect Agent and the Code Agent.

A Framework for Agent Skill Development: The Plan, Delegate, Grade Workflow

At its core, the Architect Agent follows a simple yet powerful workflow that underpins effective agent development. The whole process boils down to this really elegant, powerful loop:

Plan > Delegate > Grade > Iterate > Learn

  1. Plan: Create detailed, structured instructions for code agents
  2. Delegate: Send instructions to code agents for implementation
  3. Grade: Evaluate completed work against objective rubrics (target: 95% or higher)
  4. Iterate: Guide improvements until quality threshold met
  5. Learn: Update code agent memory with successful patterns

Architect Agent workflow diagram showing Plan, Delegate, Grade, Iterate, Learn cycle with Code Agent execution flow Architect Agent workflow diagram showing Plan, Delegate, Grade, Iterate, Learn cycle with Code Agent execution flow

Here's the kicker: this isn't just some theoretical framework I drew on a whiteboard. This is a battle-tested workflow that I've been using for well over a year. It's gone through many revisions, and it solves the very real problem of keeping the human in control while leveraging AI for what it's best at.

Building Human-in-the-Loop Agent Skills

Key Insight: The Architect Agent isn't about replacing human judgment. It's about amplifying it. You get the speed of AI with the oversight of a human.

I find it essential to put the human back in the loop, especially with production systems where you must be certain about what changes are being made, or when coordinating across multiple repositories. This extra level of due diligence is critical for successful agent development.

That was the spirit of the Architect Agent: preventing you from getting lost in the coding agent's output and providing a second set of eyes to both guide and grade the work.

The human is always in the loop. You get to see the plan before any code is written. You can catch mistakes super early. You're not waiting until the end to find out the AI went completely off the rails.

And really, what it all comes down to is this: You take that mysterious AI black box and turn it into a transparent partner. You get the control back.

Human-in-the-loop diagram

The Dual Instruction Philosophy for Agent Skills

Here's an innovation that emerged from practical agent development: every instruction has two versions.

When the architect agent creates instructions for the coding agent, it also creates instructions for humans. This means you can:

  • Follow along in real-time
  • Opt to do the task yourself if you prefer
  • Request a summary: "Give me a 25-point bullet list of what's actually happening right now"

You decide when and how to interject. Even if you're not going to execute manually, the instructions are laid out so you could.

Here's what the human instructions look like in Notion, structured so a human can execute manually when needed (I use a Notion agent skill that I wrote to upload markdown to notion):

Notion workspace showing human instructions organized with architecture documents, specifications, and manual execution guides Notion workspace showing human instructions organized with architecture documents, specifications, and manual execution guides

The Guardrails: Non-Negotiable Professional Standards

How does this system actually enforce such a high standard? It's not just about that loop. The whole thing is built on a set of guardrails. These are non-negotiable, professional rules baked right into the process.

Mandatory Git Workflow

For all the developers reading this, you're gonna love this. The AI is forced to use a proper professional Git workflow:

  • No committing directly to the main branch. Ever.
  • Every single change needs its own branch, its own issue, and a formal pull request
  • This isn't a suggestion. It's mandatory.

It guarantees you have a full audit trail and a human reviewer for every single change. The days of mysterious commits appearing in production with no explanation are over.

Does the code agent forget. Yes. It does happen. This is where the grading feedback loop comes in.

The 100-Point Rubric

Remember that grade report concept? It's not made up on the fly. It's all based on a comprehensive 100-point rubric that covers everything you'd expect in a professional environment:

The 100-point rubric dimensions

The 100-point rubric evaluates code agent work across six critical dimensions:

  • Completeness (25 points): Measures whether all requirements and success criteria were fully met. This is the largest category, reflecting that delivering what was asked for is paramount.
  • Code Quality (20 points): Evaluates correctness, maintainability, clarity, and adherence to best practices. High-quality code should be easy to understand, modify, and extend.
  • Testing & Verification (20 points): Assesses whether automated tests run and pass, coverage meets or exceeds 60%, and all actions are properly verified. This ensures reliability and catches regressions early.
  • Documentation (15 points): Examines the quality of logs, change documentation, README updates, and inline comments. Good documentation accelerates future development.
  • Git Workflow (10 points): Verifies adherence to proper branching, PR creation, meaningful commits, and clean history. Professional Git practices ensure traceability.
  • Architecture (10 points): Evaluates code organization, modularity, separation of concerns, and alignment with system architecture. Good architecture supports long-term maintainability.

Automatic Grade Caps: Forcing Quality from the Start

This is a brilliant feature to hammer home best practices. The system has automatic grade caps:

Automatic grade caps table

  • Unit tests not run: Maximum grade capped at D (65%)
  • Unit tests fail: Maximum grade capped at F (50%) -- UNACCEPTABLE
  • Test coverage below 60%: Maximum grade capped at C- (70%)

Get this: If the code agent turns in code with zero unit tests, it literally doesn't matter how amazing the rest of it is. The absolute maximum score it can get is 65%. A D. That's a fail.

It basically forces the AI to build quality in from the start.

Here's what a real grade summary looks like from a recent project:

Grade Summary showing Phase 1 Ingestion CLI with A- (92/100) score, detailed category breakdown including File Structure, CLI Commands, and Pydantic Models Grade Summary showing Phase 1 Ingestion CLI with A- (92/100) score, detailed category breakdown including File Structure, CLI Commands, and Pydantic Models

Our Evolution in Agent Development: From Prompts to Reusable Agent Skills

The Architect Agent has evolved significantly over time, mirroring the broader evolution in agent development practices.

It started as a project in Claude Desktop where I copied markdown files into Claude Code Desktop Projects (we had a tool to make this easier). These became instruction files: markdown documents I wrote in Notion, exported, and used as initial prompts to build out the Architect Agent. Then we added file system MCP so Claude Desktop could access the code agent repos directly.

Eventually, the Architect Agent became an agentic skill. I got tired of copying prompts around every time I needed a new instance. This is a common pattern in agent skill development: start manual, then automate.

Evolution from prompts to reusable skills

Passive Logging: Essential for Agent Skill Development (And Your Wallet)

One of the biggest improvements in my agent development journey came from rethinking how logging and auditing works.

Before, logging was active: the LLM generated logs manually. This meant the coding agent had to:

  • Take a few steps
  • Stop and write about it
  • Take more steps
  • Stop and write about it

Trying to manually log every single thing the AI does would burn through a ton of expensive AI tokens. So instead, the system now uses clever, automated hooks that just log when a tool is used. Auditing AI Agents with hooks is more deterministic, and it saves tokens.

With hooks and plugins, logging became passive. No more slowdowns. No more constant reminders.

The result? A 60 to 70% reduction in token costs just for logging. That's real money and time saved.

Passive logging vs active logging comparison

Notion dashboard for monitoring agents

Detailed logging comparison

Case Study: A Multi-Phase Agent Skill Development Project in Action

Let me walk you through a complete cycle using actual screenshots from a recent project. This shows the full Plan > Delegate > Grade > Iterate flow in practice.

How to Develop Custom Agent Skills: A Step-by-Step Breakdown

Step 1: Instructions Created and Sent

The architect agent creates detailed technical instructions and sends them to the code agent. Notice the 15-point summary: clear, actionable, measurable.

Architect agent sending instructions to skill-scanner code agent with Phase 2 summary including objectives, new submissions, verification steps, and success criteria Architect agent sending instructions to skill-scanner code agent with Phase 2 summary including objectives, new submissions, verification steps, and success criteria. It even offers to send me a copy, and I usually have a dashboard in Notion where I am monitoring 1 to two architect agents managing 4 to five coding agents.

Step 2: Human Instructions Published

Simultaneously, human instructions are created and uploaded to Notion. Every action tracked with a status checkbox.

Human instructions uploaded to Notion showing completed actions: Phase 2 instruction file created, human summary created, instructions sent, uploaded to Notion Human instructions uploaded to Notion showing completed actions: Phase 2 instruction file created, human summary created, instructions sent, uploaded to Notion

Step 3: Code Agent Executes (with Monitoring)

The code agent implements the work while logging progress. At any point, I can check in, and sometimes issues are caught mid-execution.

Code agent detecting two issues during execution: private repos being scanned when they shouldn't be, and repos without skills being added to database Code agent detecting two issues during execution: private repos being scanned when they shouldn't be, and repos without skills being added to database

Here the Code agent was detecting two issues during execution: private repos being scanned when they shouldn't be, and repos without skills being added to database. In reality, that was me who saw and complained about the private repo being scanned but since it ended up in the logs of the code agent, the architect agent saw it as the code agent. I am often checking and redirecting coding agents. This allows me to redirect at the plan / spec level too.

Step 4: Bug Detection and Fix Instructions

When issues are found, the architect agent creates targeted fix instructions with specific code changes.

Detailed bug fix instructions showing exact code for private repo filtering using isPrivate flag and empty repo check before database insertion Detailed bug fix instructions showing exact code for private repo filtering using isPrivate flag and empty repo check before database insertion

Step 5: Implementation and Iteration

The code agent implements the improvements. Notice the structured response with numbered improvements.

Code agent implementing force flag logic, enhanced session logging following Hybrid Logging v2.0 protocol, and refined database integration Code agent implementing force flag logic, enhanced session logging following Hybrid Logging v2.0 protocol, and refined database integration

Step 6: PR Created, Success Criteria Met

Finally, the work is complete with a PR created and all success criteria verified.

Phase 4 implementation complete showing two-tier analysis model operational with success criteria verification table and PR link Phase 4 implementation complete showing two-tier analysis model operational with success criteria verification table and PR link

My go to with code agents are OpenCode running Grok 4 Code Fast, but if it is visual, then I use OpenCode with Gemini 3 Flash (screen shots). I can use Codex, Gemini or Claude Code for Coding agents or Architect Agents but Claude Code and OpenCode work best due to their support for lifecycle logging via hooks and plugins.

Multi-Agent Systems: Coordinating Agent Skills Across Repositories

This is where the "quarterback" metaphor becomes concrete.

When you have multiple repositories, services, and microservices requiring coordination, a change in one often necessitates changes in others. The architect agent maintains the big picture. This is essential for managing complex multi-agent systems.

Multi-agent coordination diagram showing Human Operator at top providing oversight to Architect Agent quarterback, who coordinates three Code Agents working on Backend, Frontend, and Infrastructure Multi-agent coordination diagram showing Human Operator at top providing oversight to Architect Agent quarterback, who coordinates three Code Agents working on Backend, Frontend, and Infrastructure

The Big Payoff: From Black Box to Transparent Partner

So we've got this really robust system, right? Planning, grading, quality control. But what's the big payoff here? Why does all this extra structure matter so much?

Here's why this is a game changer:

  1. Predictable Quality: You're swapping unpredictable "hope for the best" outcomes for a predictable, quality-first process
  2. Early Mistake Detection: The human is always in the loop. You see the plan before any code is written, catching mistakes super early
  3. Full Audit Trail: Every change has a branch, an issue, a PR, and a grade report. Nothing happens in the shadows. No more having this conversation with the coding agent "How did you fix that?" followed by "Fix what?" "The thing with the IAM policies." "I don't know what you're talking about."
  4. Cost Efficiency: Passive logging cuts token costs by 60-70%
  5. Cross-Platform Flexibility: Works with Claude Code, OpenCode, Gemini, Codex, and 10+ other agents (but mostly Claude Code and OpenCode)

What it all comes down to is this: You take the AI from a black box to a transparent, governed partner.

Best Practices for Agent Development: Speed vs Control

Let's be honest about the trade-offs in agent skill development.

There are different levels of control, and some depend on how much time I have. Sometimes, when coding with coding agents, I also apply agent skills from spec-driven development. The more spec and planning I put in, the slower it moves forward, but the more precise the results afterward.

The paradox: The more controlled something is, the less human interaction it needs. I'm a limited resource, so that's a good thing. But the more controls you implement, the slower the process.

Sometimes you have to go slow to go fast. If that makes sense, you know what I'm talking about. -- Rick Hightower

Installation: Your First Step in Agent Skill Development

Ready to try it? Here's how to get started with your own agent development workflow.

Using skilz CLI (Recommended)

The easiest way to install agent skills across any AI coding assistant is to use skilz agent skill installer:

pip install skilz

# Claude Code (user-level, available in all projects)
skilz install https://github.com/SpillwaveSolutions/architect-agent

# Claude Code (project-level)
skilz install https://github.com/SpillwaveSolutions/architect-agent --project

# OpenCode
skilz install -g https://github.com/SpillwaveSolutions/architect-agent --agent opencode

# Gemini, Codex, and 14+ other agents supported
skilz install https://github.com/SpillwaveSolutions/architect-agent --agent gemini

Quick Start Commands for Agent Development

Once installed, trigger the skill with natural language:

Quick start commands reference

  • "write instructions for code agent": Create detailed technical instructions
  • "initialize architect workspace": Set up workspace structure
  • "grade the code agent's work": Evaluate completed work against rubric
  • "send instructions to code agent": Copy instructions for execution

Automated Workspace Setup

For the fastest start, use the templates:

cd ~/.claude/skills/architect-agent/templates/

# Create code agent workspace
./setup-workspace.sh code-agent ~/projects/my-code-agent

# Create architect workspace
./setup-workspace.sh architect ~/projects/my-architect \
    --code-agent-path ~/projects/my-code-agent

Complete setup in less than 5 minutes. You can just ask the skill to set up the code agent or the architect agent folder, and it will. It has all of the scripts to send up code agent folders and architect agent folders. It even install the right hook or plugin for logging and observability.

The Future of Agent Skill Development: From Skills to Plugins

The next big thing, and I don't know when this will happen because I have a day job and a bunch of other projects going on, would be to make this into an Agent plugin.

Most of this is background activity, not my main focus. I have a set of commands, hooks, and OpenCode plugins. I'd like to take this skill and turn it into a full-blown Claude Code plugin.

But once I do that, it won't work as well with OpenCode, which it does today. There's the aspect of evolution when you're using multiple coding assistants. I always use Codex and Gemini. It would be nice to support them all, and right now the best way to do that is via agent skills, since unlike plugins, agent skills are a standard.

A Question for the Future

Maybe the future of AI and development isn't just about making the AI smarter and letting it run wild. Maybe it's about us getting smarter about how we manage it. Building these kinds of sophisticated frameworks around it.

Is this how we build a future? By making AI a true partner, governed by our standards, instead of just a powerful but ultimately unreliable tool?

Key Takeaways for Agent Development

  1. Born from frustration: Real problems drive real solutions. The fog of war with production debugging demanded better oversight.
  2. Human-in-the-loop is essential: Especially for production systems. You need certainty about what changes are being made.
  3. Objective grading enables iteration: Not just "done," but measurably good. Target 95%+ quality.
  4. Guardrails matter: Mandatory Git workflow, automatic grade caps, and the 100-point rubric force professional standards.
  5. Passive logging beats manual logging: Hooks and plugins capture everything without slowing you down, cutting costs by 60-70%.
  6. Cross-platform agent skills future-proof your workflows: Works with Claude Code, OpenCode, Gemini, Codex, and 10+ other agents.
  7. Go slow to go fast: The more spec and planning, the slower the start but the more precise the results.

Resources for Agent Skill Development

The coding and architecture agents evolved alongside the underlying systems as I found time to improve them. I do other things, so this is only as evolved as I need it to be for daily use. It's a combination I use a lot because it allows me to get involved.

About the Author

Rick Hightower is a technology executive and data engineer with extensive experience at a Fortune 100 financial services organization, where he led the development of advanced Machine Learning and AI solutions to optimize customer experience metrics. His expertise spans both theoretical AI frameworks and practical enterprise implementation.

Rick wrote the skilz universal agent skill installer that works with Gemini, Claude Code, Codex, OpenCode, Github Copilot CLI, Cursor, Aidr, Qwen Code, Kimi Code and about 14 other coding agents as well as the co-founder of the world's largest agentic skill marketplace.

Connect with Rick Hightower on LinkedIn or Medium for insights on enterprise AI implementation and strategy.

Community Extensions & Resources

The Claude Code community has developed powerful extensions that enhance its capabilities. Here are some valuable resources from Spillwave Solutions (Spillwave Solutions Home Page):

Integration Skills

  • Notion Uploader/Downloader Agent Skill: Seamlessly upload and download Markdown content and images to Notion for documentation workflows
  • Confluence Agent Skill: Upload and download Markdown content and images to Confluence for enterprise documentation
  • JIRA Integration Agent Skill: Create and read JIRA tickets, including handling special required fields

Recently, I wrote a desktop app called Agent Skill Viewer to evaluate Agents skills for safety, usefulness, links and PDA.

Agent Skill Viewer screenshot 1

Agent Skill Viewer screenshot 2

Agent Skill Viewer screenshot 3

Advanced Development Agents

  • Architect Agent Skill: Puts Claude Code into Architect Mode to manage multiple projects and delegate to other Claude Code instances running as specialized code agents
  • Project Memory Agent Skill: Store key decisions, recurring bugs, tickets, and critical facts to maintain vital context throughout software development

Visualization & Design Tools

  • Design Doc Mermaid Agent Skill: Specialized skill for creating professional Mermaid diagrams for architecture documentation
  • PlantUML Agent Skill: Generate PlantUML diagrams from source code, extract diagrams from Markdown, and create image-linked documentation
  • Image Generation Agent Skill: Uses Gemini Banana to generate images for documentation and design work
  • SDD Agent Skill: A comprehensive Claude Code skill for guiding users through GitHub's Spec-Kit and the Spec-Driven Development methodology.
  • PR Reviewer Agent Skill: Comprehensive GitHub PR code review skill for Claude Code. Automates data collection via gh CLI, analyzes against industry-standard criteria (security, testing, maintainability), generates structured review files, and posts feedback with approval workflow. Includes inline comments, ticket tracking, and professional review templates.

AI Model Integration

  • Gemini Agent Skill: Delegate specific tasks to Google's Gemini AI for multi-model collaboration
  • Image_gen Agent Skill: Image generation skill that uses Gemini Banana to generate images.

Explore more at Spillwave Solutions -- specialists in bespoke software development and AI-powered automation

#AgentSkills #AgentDevelopment #ClaudeCode #AIAssistant #MultiAgentWorkflows #SoftwareArchitecture #DeveloperProductivity #HumanInTheLoop