Why the Frontend Should Run AI Models Locally With ONNX

Frontend developers need to make a paradigm shift about how they build applications using AI models, according to Angular consultant and full-stack developer Sonu Kapoor. So far, AI has amounted to making API calls to a black-box model running in the cloud, Kapoor told The New Stack. But it doesn’t have to be that way, he added. Kapoor predicted that the next evolution for AI and machine learning (ML) models will be to leverage models as a local asset in the frontend build pipeline. One way to do this is to download the models and let them run on the device. “Most of my experience is around making those models usable in real products, and then connecting them, especially to the frontend and dashboards, which is where I specialize, so that they actually add value to users, rather than just staying in notebooks,” Kapoor told The New Stack. Moving machine learning into the frontend will create major benefits in terms of performance and data privacy for users, he added. Right now, machine learning feels like something exotic because it’s a black box, he said. But shifting it to running on the browser changes that, he said. “To me, it feels a lot more like engineering,” he said. “It fits way more with what I’m doing and the way I’m approaching things, [rather] than having a black box running somewhere in the cloud that does all of these things by itself.” Benefits of Running Locally One reason why developers might want to run a model locally is privacy. Cloud-based models require sending sensitive data over the wire, which is especially problematic in FinTech or healthcare, said Kapoor, who previously worked as senior Angular consultant at the financial company CitiGroup, where he architectured an Electron application that processed millions of financial trade records within seconds. When a model is downloaded and run locally via ONNX, sensitive data never needs to leave the device, he pointed out. “A huge issue with models is privacy, because you’re sending data over the wire to the backend to have the model do something for you,” Kapoor said. “Sometimes you may not want to do that because of privacy concerns, you have sensitive data and so on. If you have the model downloaded locally, right, then you wouldn’t have to do that.” ONNX allows developers to run the application with a downloaded model, which can increase privacy. Running models locally also enables better offline UX and instant feedback loops, he said. If an API call fails, the app can still provide a heuristic answer or cache result instead of just failing. Even partial results can make the user experience feel smarter and more interactive, he said. Local models allow developers to build apps that can inject reasoning and automation — not just predictions— directly into user workflows. – Sonu Kapoor, Angular consultant Local models allow developers to build apps that can inject reasoning and automation — not just predictions — directly into user workflows, he said. This opens up the possibility for apps to be more adaptive and context-aware. But it’s also not an either/or choice. Developers can mix and balance a local strategy with a cloud-based approach. For example, developers could run a smaller model locally to handle low-latency tasks such as auto-complete or intent detection, and only call the cloud for tasks that require heavy reasoning, he said. It’s also important that developers are transparent with users about what runs locally versus what goes to the cloud, he said. That means an app should clearly show when inference happens locally versus remotely, to give users visibility into any data that leaves their devices. Even a small UX cue, such as a tooltip or privacy label, can help build trust when mixing local and cloud models, Kapoor said. Bringing Models to the Browser One way to bring models to the browser is via the Open Neural Network Exchange (ONNX) Runtime Web, a version of the ONNX Runtime that’s designed specifically to run machine learning models in the browser using JavaScript, Kapoor said. ”You can train a model in one tool, say Pytorch or TensorFlow even, and then save it as an ONNX format; and that allows you to run it everywhere,” Kapoor said. “It makes it really great for deployment, especially when you want to run a model outside of Python, because a lot of those models require that you know Python, or [that] you run them within Python.” He described ONNX as like a .pdf for machine learning — a universal format that allows models trained in frameworks such as PyTorch or TensorFlow to run anywhere. “Teams exploring TensorFlow.js or ONNX Runtime Web quickly discover that model load time and thread blocking behave like any other performance budget,” Kapoor said. ”With ONNX, actually, you can download the model so you can convert it and you can run it anywhere you want, like with JavaScript or Node.js.” ONNX is like a .pdf for machine learning — a universal format that allows models trained in frameworks such as PyTorch or TensorFlow to run anywhere. – Kapoor In fact, he recently built an app for a NASA presentation that integrated a solar flare prediction model — specifically, NASA’s Aurora model — using ONNX to make it usable in a real-product dashboard on the frontend. “I integrated that with NASA’s Aurora no-cost model, and then I’ve worked with TensorFlow and Pytorch models through the API and use something like BERT [an AI model that excels at text] or CLIP [a neural network that handles connecting text and images] to embed images and text,” he said. “Most of my experience is around making those models usable in real products, and then connecting them, especially to the frontend and dashboards, which is where I specialize, so that they actually add value to users rather than just staying in notebooks.” He pointed to Angular’s ability to handle heavy data effectively as key to the success of the $4 billion global trading platform, which incorporated real-time data and advanced visualization. “It costs them a lot of money if there’s a delay of a couple of seconds already,” he said. “That’s where Angular signals come in. When you receive heavy data like this, you need it to be really snappy and really performant.” The application was deployed locally (pre-compiled and bundled) on each trader’s system, thus eliminating potential network delays. Angular’s Advantage The application’s success was a testament to Angular’s ability to handle heavy data efficiently, Kapoor added. It wasn’t always possible with the framework. Angular previously relied on Zone.js, which required a re-check of the entire DOM tree for even small data changes, he said. That was inefficient, especially with heavy computation. But Angular’s adoption of Signals in version 16, released in May 2023, allowed developers to opt out of Zone.js. Being able to opt out provided better isolation and made it possible to run heavy inference or data preparation off the main thread, according to Kapoor. “Signals gave us the right isolation model, you can say, and what happens is, with Signals, you can opt out of Zone.js, and now the entire DOM tree doesn’t have to be rechecked,” he said. “So you can run really heavy inference or data prep of the main thread and let the UI react only when the results are ready.” Angular’s reactive change-detection and the Signals ecosystem provide a strong foundation for isolating compute-heavy operations from UI rendering, he added. The Next Evolution of Frameworks Kapoor said the next evolution of frameworks will revolve around AI, specifically building pipelines where AI models are treated like images and fonts. They will be bundled and lazy-loaded alongside other code assets with predictable performance costs, he said. He already sees signs of this with Angular’s MCP server. It runs locally and offers project context. It also helps developers build components with best practices. Angular also recently open sourced Web Codegen Scorer, which allows framework creators to set up environments to ensure models are following best practices for a particular framework. Angular and Solid are already supported by the tool. “As [AI] models shift to the frontend, developers need clear boundaries to ensure privacy, performance, and responsible behaviour.” – Kapoor But there will need to be guardrails in place as well, Kapoor said. “As models shift to the frontend, developers need clear boundaries to ensure privacy, performance, and responsible behavior,” he said, recommending that: Data from local inference should never persist or leak through storage or logs; Models should be version-controlled and checksum-verified; Compute intensity should be capped to protect UX and battery life; and The app should clearly communicate when a model is making autonomous decisions versus offering suggestions. In enterprise builds, these checks can even be baked into the CI/CD pipeline. So, for example, apps should be linting model metadata or validating inference outputs before surfacing them to the UI, he said. The real shift isn’t just running models locally, according to Kapoor. It’s treating them as first-class citizens of the frontend: Versioned, tested, observable and bound by the same guardrails as production code. The post Why the Frontend Should Run AI Models Locally With ONNX appeared first on The New Stack.

Why the Frontend Should Run AI Models Locally With ONNX

Related Articles

Jupyter AI v3: Could It Generate an ‘Ecosystem of AI Personas?’

Google sues to dismantle Chinese platform behind global toll scams

The DevOps Impact of API-First Development

Introducing Our Final AWS Heroes of 2025

Choosing Your AI Orchestration Stack for 2026

Vibe Coding vs. Spec-Driven Development: Finding Balance in the AI Era