Closing the Loop: Google Just Validated Deterministic Code in the Loop

By Keith TownsendPublished On: May 6, 2026

Google’s AI and Infrastructure team published a blog post today describing how they achieved a 6x speedup migrating production machine learning models from TensorFlow to JAX. Sundar highlighted the result in the Google Cloud Next keynote. The post is worth reading for the engineering alone, but the architecture behind it is what matters to this audience.

They didn’t point a single AI agent at a codebase and say “migrate this.” They tried that. It failed. Single-agent setups couldn’t balance structural rules with execution details at Google’s scale — thousands of lines of code, hundreds of layers, deep dependencies across multiple files. The agents lost context, hallucinated APIs, and produced code that wouldn’t build.

What worked was a multi-agent architecture with three distinct roles:

A Planner agent that uses deterministic, compiler-based static analysis to map the entire dependency tree and sequence the migration from leaf nodes upward. No AI judgment in the sequencing. The dependency graph is the dependency graph.

An Orchestrator agent that chunks work to fit context windows, injects domain-specific Playbooks, and handles failure recovery. The Playbooks are hierarchical — from general repository instructions down to client-specific “golden examples” distilled from successful manual migrations. YouTube’s ranking model infrastructure gets a different Playbook than a general-purpose model.

A Coder agent that reads files, writes code, runs builds, and executes tests in a constrained test-and-fix loop. It keeps working until it produces compilable, verifiable code. It doesn’t decide when it’s done based on its own assessment. Done is defined externally — by build success and mathematical equivalence tests.

And then the validation layer. They verify correctness using algorithmic gradient ascent to find the maximum error between the original TF layer and the new JAX layer. Mathematical verification of functional equivalence. They also run a separate LLM Judge that scores migrated code against an architectural checklist. Two layers of validation, one deterministic, one AI-assisted — and the deterministic one is the gate.

I Published This Pattern Twelve Days Ago

On April 24th, I published “Deterministic Code in the Loop” on The CTO Advisor Substack. The core argument:

The AI reasons. The code decides.

Let the LLM do what it’s good at — reasoning over unstructured data, pattern recognition, extraction. But when it comes to the governance decision — does this output meet the criteria to move forward? — that decision gets made by explicit, version-controlled, inspectable code. Not a model. Not a prompt. Code.

Google’s TF-to-JAX architecture is a textbook implementation of this pattern.

The Planner is deterministic code governing sequencing. It doesn’t ask an LLM to figure out the migration order. It uses compiler-based static analysis — the same kind of deterministic tooling enterprises have relied on for decades — to build the dependency tree and define the execution plan.

The Playbooks are deterministic policy constraining agent behavior. They prevent the agents from hallucinating APIs, deviating from coding standards, or applying generic patterns where domain-specific patterns are required. The Playbooks are the governance layer. They are inspectable, version-controllable, and auditable. They do the same thing every time.

The mathematical verification is deterministic code deciding “done is done.” This is the exact gap I identified in the Substack piece — the industry gives you two bad options for assessing completion: human review that doesn’t scale, or AI self-evaluation that can’t be audited. Google’s answer is the third option I described: deterministic code that verifies the output meets criteria before it moves downstream. Algorithmic gradient ascent finding maximum error between source and target isn’t a probabilistic assessment. It’s math.

This isn’t a coincidence. It’s convergent architecture. The pattern is correct because the problem demands it. You cannot run AI-assisted migration at scale on probabilistic governance. You need deterministic code in the loop.

Hiding in Plain Sight

The TF-to-JAX post validates the governance pattern. But it also reveals something the industry hasn’t named yet.

The Orchestrator agent — the one making decisions about how to chunk work, which Playbook to inject, how to handle failures, when to retry — is itself driven by a model. That model is making orchestration decisions that shape the behavior of every other agent in the system. It’s deciding what context the Coder sees. It’s deciding which golden examples get injected. It’s deciding how to recover when a build fails.

What model is that? What are its optimization objectives? Who trains it? What happens when its orchestration judgment diverges from the Playbook constraints? What happens when it makes a chunking decision that strips critical context from the Coder’s window?

This is the control plane model problem. And it’s a DAPM problem. Google made an explicit decision authority placement when they designed this architecture. They retained authority over sequencing by keeping the Planner deterministic. They retained authority over agent behavior by encoding governance in Playbooks. They retained authority over completion by using mathematical verification. But they delegated orchestration judgment — the highest-leverage decisions in the entire system — to a model. That’s a governance choice. The Orchestrator decides what context every other agent sees, which Playbook gets applied, and how failures get handled. Whoever governs the Orchestrator governs the migration. And right now, that governance is implicit.

This applies everywhere the Reasoning Plane shows up.

In “I Just Wanted Endpoints,” I described Layer 2C as the orchestration intelligence between hardware and application — the layer that decides what model runs where, how memory gets allocated, and how serving runtimes coexist. Google’s Inference Gateway and Dynamic Workload Scheduler are building this layer for infrastructure. But the intelligence driving those placement and routing decisions is also a model — or at minimum, a set of heuristics trained on assumptions about workload characteristics, cost optimization, and latency tradeoffs.

The same pattern appears in the TF-to-JAX migration architecture. The Planner is deterministic. The Playbooks are deterministic. The mathematical verification is deterministic. But the Orchestrator — the component that ties everything together and makes the real-time judgment calls — is a model. And its governance is opaque.

Every system that implements a Reasoning Plane has a control plane model at its center. The model that governs the orchestration. The model that decides what the other models see, do, and produce. And right now, nobody is talking about how to govern that model.

The Pattern Is the Same at Every Scale

On my DGX Spark, I’m the control plane model. I decide which vLLM container gets loaded, which model Ollama serves, how GPU memory gets allocated. My judgment is the orchestration intelligence. I can’t inspect my own decision-making. I can’t version-control it. I can’t hand it to someone else and guarantee they’d make the same calls.

At Google’s scale, the Inference Gateway is the control plane model for infrastructure. The Orchestrator agent is the control plane model for code migration. Both are making decisions that shape every downstream outcome. Both are governed by constraints — Playbooks, routing policies, scheduling rules — but the judgment that applies those constraints in context is itself unconstrained.

The enterprises that figure out how to make this governance explicit — how to define, inspect, and audit the control plane model driving their Reasoning Plane — are the ones that will get AI past pilot at scale. The unlock isn’t making AI more autonomous. It’s making the governance of AI orchestration deterministic, inspectable, and auditable.

Deterministic code in the loop isn’t just a pattern for governing individual AI tasks. It’s a pattern for governing the Reasoning Plane itself. The code decides what the orchestration model is allowed to do, just as it decides what the worker model is allowed to produce.

The AI reasons. The code decides. All the way up.

Keith Townsend

Keith Townsend is a seasoned technology leader and Founder of The Advisor Bench, specializing in IT infrastructure, cloud technologies, and AI. With expertise spanning cloud, virtualization, networking, and storage, Keith has been a trusted partner in transforming IT operations across industries, including pharmaceuticals, manufacturing, government, software, and financial services.
Keith’s career highlights include leading global initiatives to consolidate multiple data centers, unify disparate IT operations, and modernize mission-critical platforms for “three-letter” federal agencies. His ability to align complex technology solutions with business objectives has made him a sought-after advisor for organizations navigating digital transformation.
A recognized voice in the industry, Keith combines his deep infrastructure knowledge with AI expertise to help enterprises integrate machine learning and AI-driven solutions into their IT strategies. His leadership has extended to designing scalable architectures that support advanced analytics and automation, empowering businesses to unlock new efficiencies and capabilities.
Whether guiding data center modernization, deploying AI solutions, or advising on cloud strategies, Keith brings a unique blend of technical depth and strategic insight to every project.