EngineeringArticleApr 13, 2026

From Writing Code to Validating It: The Engineering Manager's New Job

Wilco van Duinkerken

CTO

7 minutes

Engineering management used to be about helping people write good code. With AI in the loop, the work has shifted. Code is no longer the bottleneck. Validating that the code actually does what it should, safely and durably, is.

Rigor Migrates Upstream

Quality control hasn't disappeared. It has moved. The careful thinking that engineers used to do while typing now needs to happen before the prompt and after the output. Specifications, constraints, test definitions, risk framing. The work is the same in volume. It just lives in different parts of the cycle.

Teams that don't notice the migration end up with engineers who type prompts as carelessly as they used to type code. The AI obliges. The result looks like productivity but functions like a debt pile.

Code Proven to Work

Simon Willison put it bluntly: your job is to deliver code you have proven to work. An untested AI-generated pull request is not a contribution. It's negligence with a green checkmark.

The role of an engineer in an AI-assisted team is shifting from writer to verifier. The hard skill is not generating the next 200 lines. It's having the discipline to ask: have I shown this works under the conditions it will actually run in?

Testing Becomes a Survival Mechanism

When change rates accelerate, untested systems collapse faster. Comprehensive test coverage stops being a nice-to-have signal of professionalism and becomes the only thing standing between a fast team and a fast meltdown.

Practical implication: invest in test infrastructure before you invest in more AI seats. Unit, integration, smoke, and end-to-end coverage. Local fast feedback for engineers. CI gates that actually fail builds. Test data strategies that don't require manual setup. The teams that scale AI well are the teams that already had this in place, or the teams that build it before opening the floodgates.

Security: From Optional to Existential

AI accelerates the rate at which security flaws appear in your codebase. It also accelerates the rate at which they're discovered, often by the wrong people. Both pressures point the same direction. Security work that used to live downstream now belongs in the same loop as feature development.

Concretely: dependency scanning, secret detection, and basic SAST in CI on every PR. Authentication and authorization decisions reviewed before code is generated, not after. A clear policy on what AI agents are allowed to read, write, and call. None of this is new. What's new is that you can't push it to next quarter without paying for the delay in incidents.

Use LLMs to Evaluate LLMs

If models can write code, they can also assess it. Deploy stronger models as evaluators of weaker ones. Run them over generated PRs to flag obvious quality issues, missing tests, hallucinated imports, undocumented behavior.

An existing test suite is already a primitive form of LLM judge. Expand it deliberately. Code review checklists, security policies, and documentation standards can be encoded as model prompts and run automatically. Humans then spend their attention on the things models can't catch.

Watch Out for AI Slop

AI slop is the term for low-quality, mass-produced code that looks polished on the surface and falls apart underneath. Bloated functions. Silent error handling. Hallucinated imports. Missing tests. Redundant patterns repeated five times because the model didn't notice it had already solved the problem.

Slop is dangerous because it passes review when reviewers are tired, the diff is large, or the surface looks plausible. It accumulates faster than humans can clean it up. After a few months, the codebase carries a layer of slop that's expensive to remove and even more expensive to debug.

The Slop Jar Rule

One simple cultural intervention: every confirmed instance of AI slop merged into main earns the responsible engineer a contribution to the slop jar. Three contributions and you buy lunch for the team. The point isn't the money. It's making slop visible and socially costly. Teams that adopt the rule report cleaner PRs within weeks.

Process Implications

Compress sprint cadence. Two-week sprints assume change takes weeks. With AI in the loop, weekly is closer to reality.
Bring back pairing and ensemble work where it died. Real-time review compensates for the loss of natural handoffs.
Make merge protections non-negotiable. Master branch should never accept code that hasn't passed your full validation gauntlet, no matter how urgent the feature.
Separate experimental repositories from production ones. Slop is fine in a sandbox. It is not fine in the system that bills your customers.

The New Manager Job

The role hasn't gotten easier. It's gotten different. Less time editing code, more time editing the conditions under which code gets accepted. Less time mentoring on syntax, more time mentoring on judgment. Less time worrying about velocity, more time worrying about whether the velocity is producing anything you'd be willing to defend at three in the morning.

Engineers who learn to validate become more valuable, not less. Managers who help them learn it become indispensable. The teams that ignore the shift will look productive for a few quarters and brittle thereafter. The ones who embrace it will spend less time firefighting and more time shipping things that actually work.