Why the CNCF’s New Executive Director Is Obsessed With Inference

“I’m obsessed with inference,” Jonathan Bryce, who took over as the executive director of the Cloud Native Computing Foundation (CNCF) this summer, said during a panel hosted by The New Stack at KubeCon North America 2025 in Atlanta. “A lot of people have been really into LLMs [large language models] and training,” Bryce told me later that day. “Where I think we’re kind of missing the real key part of the story is around inference.” In this episode of The New Stack Makers, I sat down with Bryce to discuss why he believes inference will dominate the next decade of computing, what the CNCF’s new Kubernetes AI conformance program means for enterprises and how projects across the CNCF’s portfolio of more than 130 open source projects are being reshaped by AI workloads. Why Inference Is the Real Opportunity for the CNCF Bryce’s “obsession with inference” isn’t a bad obsession to have. While the industry has long remained focused on training massive LLMs, he sees inference — that is, serving those models — as the workload that will define the next era of computing. And it’s also where the CNCF, with its wide portfolio of infrastructure projects that are now maybe even more important than ever, can play a foundational role. “Inference specifically fits so well with the technologies that we have in the cloud native community,” he explained. “It’s all about deploying, securing, scaling, observing and doing it in a way where it’s much more of an online, real-time type of application versus batch-like training.” GPUs are expensive, scarce and power-hungry and will remain so for the foreseeable future. Bryce believes cloud native tooling can deliver not just incremental improvements, but “orders of magnitude of efficiency for these inference stacks.” Kubernetes, the CNCF’s flagship project, is often at the core of this. “I think the kind of common journey that people have been on is they will take some stack, it might be Ray on Kubernetes or KServe, which just graduated to become a CNCF incubating project this week. KServe is an inference serving engine. They’ll take these kinds of things and they’ll deploy them on top of Kubernetes, and that will get them to the first phase of being able to load up a model and start to answer queries and do the basic level of inference,” Bryce explained. The Kubernetes AI Conformance Program The CNCF launched a Kubernetes AI conformance program at KubeCon, giving enterprises a baseline for running AI workloads. The v1 specification focuses on GPU support and Dynamic Resource Allocation (DRA), ensuring that conformant Kubernetes environments have the primitives needed for running AI inference. “If you have an AI workload, you’re going to know that there are certain components available, like DRA and some other pieces within a Kubernetes environment,” Bryce said when I asked him about this new program. “You can have a conformant Kubernetes environment that’s just kind of plain vanilla Kubernetes and it doesn’t necessarily have all of those elements that you would want if you’re trying to run an AI workload. And I would say, the simplest way to think about this is it’s really targeting accelerated workloads.” Bryce sees the conformance program as one leg of a three-part foundation the community needs: a target to aim for, conformant implementations and reference architectures based on the community’s experiences from real-world deployments. “Right now, I think where we are is pretty far back, where everybody is kind of figuring it out on their own,” he said. The Agent Inference Explosion Is Coming The current hype around AI agents is only increasing the need for these solutions, Bryce argues. Agents that work on complex, multistep tasks in parallel will dramatically increase the load on inference systems, after all. “An interaction that we have with an LLM is actually quite slow and low volume,” Bryce noted. “When you go out and you give an agent a complex task with multiple steps, it’s going to try to do that in parallel, or as fast as it can. That’s going to be something that is increasing the load dramatically. Anything that you can do to make those requests happen more efficiently — smaller models, better inference, whatever it is — that’s going to make those agents more efficient, more cost-effective, and also provides better quality results.” This is where the cloud native community’s expertise becomes critical. As Bryce noted, the networking and routing primitives already built into Kubernetes can be extended with inference-aware plugins that route requests to specific GPUs or prefilled caches — delivering significant performance gains without having to change Kubernetes’ core architecture. Going Beyond the ChatGPT Moment Three years after ChatGPT launched, Bryce believes enterprises are ready to move past the “ChatGPT moment” and find the right models for the right use cases. That means smaller, specialized models trained on purpose-built datasets — not just massive LLMs searching through “the history of every Nobel Prize winner and the campaigns of Genghis Khan” to answer a simple question about Atlanta traffic. “We have to move beyond the ChatGPT moment and LLMs in our thought process around what is AI and how are we going to get the most out of it,” he said. This, he argues, will allow the community to be on track to provide the infrastructure software for “the biggest workload mankind will ever have.” The post Why the CNCF’s New Executive Director Is Obsessed With Inference appeared first on The New Stack.

Why the CNCF’s New Executive Director Is Obsessed With Inference

Related Articles

Microsoft Patch Tuesday, December 2025 Edition

SAP fixes three critical vulnerabilities across multiple products

Microsoft Fixes Exploited Zero Day in Light Patch Tuesday

Windows PowerShell now warns when running Invoke-WebRequest scripts

Packer-as-a-Service Shanya Hides Ransomware, Kills EDR

Postgres + ClickHouse: The OSS Stack To Handle Agentic AI Scale