Harness Engineering, Part 2: Three-Dimensional Coupling, the Refactoring Pipeline and the Blueprint of the AI-Native Dev

Part 1 of this playbook established the paradigm: Agent = Model + Harness, the 8 pillars of a production Harness, the R.P.I. method, the 3-layer context architecture and the L0 to L4 maturity scale. Concept covered. But concept without a tactical tool becomes a keynote slide. This Part 2 delivers what was left out: the analytical and operational tools that let you apply the paradigm without improvising.

Five pieces that separate a team that understands the talk from a team that executes: (1) the AI-guided refactoring pipeline in 4 stages, (2) Khononov's three-dimensional coupling analysis (Strength, Distance, Volatility), the mathematical framework to decide modular boundaries, (3) the Velora case as a counterpoint to PayPal: an AI-First workflow applied to lean teams instead of enterprise operation, (4) the Blueprint of the AI-Native Developer as a four-layer vertical stack, and (5) the stitching with the 7-phase Corporate Adoption Framework that we already documented in a dedicated post. Everything that was missing to close the loop.

AI-Guided Refactoring Pipeline: the 4 stages

Refactoring a legacy codebase with AI is the case where Vibe Coding fails most spectacularly. You ask "refactor this module", the agent rewrites half the system with an invented architecture, deletes tests, breaks invisible dependencies and hands you a PR that no one will be able to review. The AI-Native playbook treats refactoring as a pipeline of four strictly sequential stages, each one with an auditable output.

Stage 1, Anomaly Detection

Mapping code smells and structural deviations via AI. Before touching a single line, you use the agent to diagnose: where the real duplications are (not just the ones the linter shows), which classes accumulate too much responsibility, which modules have cyclic coupling, where the hierarchy strays from the convention of the rest of the repository. Output: a Markdown report listing anomalies with a reference to file and line. That report is the input for the next stage, not a direct input for execution.

Stage 2, Standardization

Resolving duplications with the dual goal of improving the code and reducing future token consumption. Standardized code is cheaper for the agent to process, it is not just aesthetics, it is context-window savings on every subsequent task. You apply the fixes in order of impact: the three or four duplications that appear in the most places first, because each of them will generate compound savings in the agent's next runs. Output: a small, surgical PR per standardization, with an associated Test Gate.

Stage 3, Consolidation (DDD)

Flattening the hierarchy and logical isolation into pure Domain Services. Here you move out of the "cleanup" level and into the "reshaping of boundaries" level. It is the riskiest stage and the one that benefits most from Spec-Driven Development: you write the Design Doc for the new boundary, generate a migration Task List, and only then does the agent execute task by task. Never consolidate without an approved Spec, this is the stage where "Scope Destruction" happens most often.

Stage 4, Dynamic Update

Manual and surgical recording of the new architectural patterns in the Agents.md. This is what closes the loop: the learning from the refactoring becomes a global rule of the repository. Next time someone asks for a similar feature, the agent already starts from the correct architectural boundary. Without Stage 4, you refactored the code once, but the knowledge stayed in the head of the person who requested the refactoring, and the next agent will fall into the same hole.

Critical safety warning: never ask the AI to generate your Agents.md file from scratch based on the refactored code. It will include irrelevant information that will consume your context window forever. The rule is non-negotiable: manual curation in Stage 4. The agent suggests, you decide what goes in.

Three-Dimensional Coupling Analysis: the Khononov framework

This is the mathematical framework missing from most conversations about a "well-designed module". Vlad Khononov proposes that coupling between two modules is not a single dimension, it is a vector of three: Strength, Distance and Volatility. Each one answers a different question, and the interaction between the three is what defines whether the coupling is structurally safe or a deadly bottleneck waiting for the next deploy.

Strength, what is shared?

Sharing a domain entity generates high risk. If Module A and Module B share the same Pedido class with business logic inside, any change in Pedido ricochets. Sharing an API contract or a queue is structurally safe: the boundary is explicit, the changes are versioned, the contracts are auditable. High strength = how much of the internal semantics is coupled; low strength = only the interface is coupled.

Distance, where do they live?

If the coupling is strong and the modules live in distant repositories (separate microservices, different teams, independent release cycles), the system will suffer constant breakages. The cost of keeping two repositories in sync with strongly coupled logic is exponential. High distance requires low coupling; high coupling requires low distance. Violating that rule is the classic source of "every time I deploy service A, service B goes down".

Volatility, how often does it change?

Core business components that are highly coupled AND change frequently create deadly bottlenecks. It is the combination that destroys speed: the team becomes hostage to coordinating changes between modules that evolve together but live apart, and every release becomes a negotiation. If the module changes every week, the coupling to it needs to be weak; if the coupling is strong, it needs to change little.

Safe vs. dangerous topology

Applying the three vectors to the classic DDD decomposition (Core Domain, Supporting, Generic):

Core Domain ↔ Generic: Safe API Boundaries. Communication via a well-defined contract, low strength, high distance tolerated.
Supporting → Generic: Safe API Boundaries. Same logic.
Core Domain ↔ Supporting: ⚠ Dangerous High Coupling. This is where the system breaks. High strength (they share entities), variable distance, typical volatility (the business changes). Result: a permanent bottleneck.

Tactical action: use Modular Decomposition Agent Skills to calculate mathematically safe boundaries based on the three vectors. The agent can analyze the codebase, classify each module along the three dimensions and propose boundaries, as long as you provide the criterion as a structured Skill, not as an open prompt.

For the complete treatment of this framework with examples and analysis tools, the dedicated Steply post is worth reading: Khononov's Three-Dimensional Coupling: Strength, Distance, Volatility. This summary serves as a bridge: Khononov is the mathematical criterion that supports the "Modular Monolith vs. Microservices" decision we defended in Part 1.

Velora case: an AI-First workflow in lean teams

In Part 1 we used PayPal as the enterprise case: 24,000 employees, trillions of transactions, $2k/month in tokens per power user, dual-review on 90% of the code. That is the scenario for those operating at extreme scale. But most teams that read this playbook do not operate like that. They operate like Velora: teams of 1 to 2 devs covering the full project cycle, with an ambition for continuous delivery but a real constraint on people. The Velora case proves that the AI-Native paradigm scales down as well as it scales up.

The agile bottleneck the AI-Native workflow dissolves

The traditional dashboard: Backlog → Refinement → Sprint → QA → Review → Staging → Prod. Seven columns, each with a queue, each queue with latency. Time accumulates between stages, not in the work, in the transition. A story spends three days in Refinement waiting, two days in QA waiting for review, one day in Staging waiting for approval. The sum of the waiting beats the sum of the execution in almost every small team.

Velora's AI-Native flow

Four stations, no queues:

Prototype (Cursor). The dev validates the hypothesis quickly in the editor with AI. It is no longer "write the spec before validating technically", it is validating technical viability in hours, and only then writing the spec of what goes to production.
Generated PRD. The AI turns the prototype + product context into a structured PRD. Documentation is not overhead, it is a by-product.
Linear Slices. Tickets become Slices, robust packages optimized for consumption by agents, with enough context for autonomous execution. It is not a "user story" for a human; it is a deterministic specification for a machine.
Production. Internal bots automatically merge low-risk PRs if the Sensors (linter, tests, Test Gates) pass. Human approval is reserved for high-impact changes.

Three operational consequences:

Lean teams. Just 1 to 2 devs handling the full project cycle. The team does not grow to absorb more demand, the Harness absorbs it.
The end of Stories. Tickets become robust "Slices" optimized for consumption by agents. Writing the ticket becomes part of engineering, not of management.
Autonomous approval. Internal bots automatically merge low-risk PRs if the Sensors pass. The human dev is reserved for what really requires judgment.

The difference between PayPal and Velora is not the paradigm, it is the scale of the governance around it. The 8 pillars of the Harness hold for both. What changes is the cost of implementation, and it scales favorably for small teams: fewer people to align, fewer policies to version, less compliance to audit. A lean team with a disciplined Harness beats a large team with institutionalized Vibe Coding.

The Blueprint of the AI-Native Developer: a four-layer stack

If you were to design the vertical stack of the AI-Native dev, from the most conceptual to the most operational, it would have exactly four layers. Each one only works if the one below is stabilized. Skipping a layer is the most common way for adoption to fail silently.

┌─────────────────────────────────────────────────────────────┐
│ 4. Governança: Code Reviews + Marketplaces Auditados │
├─────────────────────────────────────────────────────────────┤
│ 3. Fluxo Paralelo: Spec-Driven Development + Git Worktrees │
├─────────────────────────────────────────────────────────────┤
│ 2. Forte Convenção: Padrões nativos diretos (flat) │
├─────────────────────────────────────────────────────────────┤
│ 1. Monolitos Modulares: contexto unificado + alta vis. │
└─────────────────────────────────────────────────────────────┘
 ↓
 O Harness Operacional

Layer 1, Modular Monoliths

Unified context and high visibility. Without it, the agent is blind to the problem domain. No matter how sophisticated the layers above are, if Layer 1 is fragmented into microservices out of fashion, the agent does not see enough to make good decisions.

Layer 2, Strong Convention

Direct native patterns, flat structures wherever possible. If your stack is Rails or Go, you inherit convention from the community. If it is JS or Python, you need to manufacture convention inside the repository, and that convention needs to be rigid, documented and audited via linter, not via goodwill.

Layer 3, Parallel Flow

Spec-Driven Development + Git Worktrees. This is where Part 1 of this playbook lives: the R.P.I. method, the Spec creation cycle, parallelism via worktrees. Layer 3 is where the daily work happens. Without the two below it, it does not scale, you end up redoing the same architectural decision on every Spec because the convention is loose.

Layer 4, Governance

Automated Code Reviews via sub-agents, internally audited marketplaces, curated Skills, verified MCPs. Layer 4 only makes sense in mature organizations, it is where "AI-Native" becomes "AI-Operations". Small teams can go three years with just the first three layers and be perfectly fine.

The essential reading: less human abstraction generates more context for the machine. The complexity does not disappear, it moves from the architecture of the code to the tactical curation of the Harness. Building a Harness is architecting twice: once the system, once the environment that builds the system.

Stitching with the 7-phase Adoption Framework

Everything above, Harness, R.P.I., 8 pillars, refactoring pipeline, Khononov, Blueprint, is technology. Technology without a corporate adoption method amplifies chaos instead of generating productivity. Steply has already documented this adoption method in a dedicated post: the 7-Phase Framework in 3 Movements (Foundation: Diagnosis, AI Enablers, Pilot. Structure: Bottlenecks, Progressive Adoption. Sustainment: Governance, Scale).

The bridge is direct. If the 7-phase post answers "how to adopt AI in the company without becoming amplified chaos", this two-part playbook answers "what exactly the team needs to build and technically master in each phase". The two readings complement each other:

Foundation (Phases 1-3) requires Diagnosis + AI Enablers + Pilot. The technical playbook that supports this phase: the 4 mistakes of Vibe Coding (for maturity diagnosis), the 3 context layers (for AI Enablers), the R.P.I. method (for the pilot).
Structure (Phases 4-5) requires Bottleneck Mapping + Progressive Adoption. The technical playbook: the 8 pillars of the Harness (the structure of what to adopt), Git Worktrees (adoption parallelism), agentic Code Review (quality during adoption).
Sustainment (Phases 6-7) requires Governance + Scale. The technical playbook: a Marketplace of audited Skills, MCPs as a gateway, Convention Drift monitoring, the 4-layer Blueprint operationalized.

Without the 7-phase Framework, this technical playbook is a weapon without a manual. Without this technical playbook, the 7-phase Framework is a process without muscle. The two together are the complete AI-Native adoption offering that Steply consolidates.

Final reframe: three checks before closing the loop

Question 1: Does your repository pass the Three-Dimensional Coupling test? Take the three pairs of modules that generate the most bugs in production. Classify each pair along the Strength, Distance, Volatility axes. If any pair is "High Strength + High Distance + High Volatility", you do not have a productivity problem, you have a topology problem. No Harness makes up for that.

Question 2: Do you refactor following the 4-stage pipeline, or do you still throw "refactor this module" at the agent and pray? If the answer is the second, you are at Maturity Level 1 regardless of whatever else you implemented. A disciplined pipeline is what separates assisted refactoring from assisted demolition.

Question 3: Are your four Blueprint layers stabilized in order? Layer 4 (Governance) without Layer 1 (Modular Monolith) is process theater. Layer 3 (Parallel Flow) without Layer 2 (Strong Convention) is parallelized chaos. Stabilize from the bottom up.

If the three answers are honest, you will have a concrete map of what is missing. It is not "one more model". It is not "a better prompt". It is infrastructure. You no longer write code, you generate code. And what generates code is the Harness, not the LLM.