
Why the shift toward on-device AI puts Ottawa’s $925M data centre bet in question
In my last post, I looked at the case for Canada’s AI data centre build-out. The main argument is that we need to build these centres to remain competitive in an economy transforming around AI and to keep our data sovereign. If the government doesn’t intervene to subsidize them, Canadians will lack “access to the compute they need,” and the centres would be built and controlled by US companies.
In this post, I want to set aside the data sovereignty issue and outline an argument that isn’t receiving much airplay at the moment in debates about the need for new data centres. It goes to the heart of whether we need to build new centres to enjoy the full benefits of AI.
The counter-argument is that access to the compute we need to enjoy many if not most of the benefits of AI is rapidly becoming much less expensive and will soon be accessible at a far lower cost economically and environmentally.
The future of AI, the argument goes, is not one in which the models we interface with most often are situated in the cloud and reliant upon large, energy-hungry data centres. Instead, it’s one in which most of the processing is done on board our personal devices, like our phones or laptops, on a company’s local server, or a rack of servers at a university or an internet service provider.
And this isn’t a matter of pure speculation. The evidence pointing in this direction is mounting. Though, to be clear, we will still need larger models and data centres for some things. The question is how many.
The devices most of us own are capable of handling locally much of the AI processing we might want to do. Gartner forecast last year that AI-capable PCs, or computers that can run language models on device rather than in the cloud, would represent more than half of all PC shipments by the end of 2026. More recent figures are hard to come by, but most of the phones and laptops now available from Apple and Microsoft, along with other brands, tout their capacity for on-board AI processing.
A further line of evidence comes from the rapid improvement in what are called small language models, or SLMs. These are AI models compact enough to run on a laptop or a local server without sending anything to the cloud. For years, the working assumption in AI was that bigger models meant better results; the more computing power behind a model, the more capable it would be. But that assumption has been breaking down . Microsoft’s Phi-4, Google’s Gemma 3, and Meta’s Llama 3.3 — all open-source models that can run locally on modest hardware — now match or approach the performance of frontier cloud models like ChatGPT 4o on the tasks that make up most of what businesses do with AI, such as analyzing, editing, or generating text. The gap that once justified routing everything through a large cloud data centre has narrowed significantly on most everyday tasks, and it continues to narrow.
Companies have good reasons to make the switch to SLMs. Without having to make a round trip to a remote server, local models respond faster. They keep data on the premises. They’re much cheaper the more processing you do. And they can be adapted to a firm’s special use cases. Tuning a small model on a few hundred domain-specific examples can lead to it out-perform large, frontier models on certain tasks.
Deloitte’s 2026 Tech Trends report forecasts that over 40 per cent of enterprise AI workloads will migrate to these smaller, locally-run models by 2027, on the basis that 80 per cent of what businesses actually do with AI — sorting, summarizing, or extracting data — doesn’t require the largest models at all. That’s probably too optimistic, but it points in a direction that things could go for many users in the near future. And if so, it calls into question the government’s assumption that Canadians will need greater access to large centralized cloud infrastructure to remain competitive in the AI economy.
One company we may be hearing more about here is Apple. The story most of us have heard over the last year or so is that Apple missed the boat on AI. It hasn’t developed its own frontier model. Its roll out of Apple Intelligence was a flop, and it has now been sidelined in the race to develop AI.
But Apple has been active on another front that may make it a central player in the near future of AI.
It has been moving in a different direction from most of the chip industry for a number of years by designing chips that integrate the central processor, memory, and graphics card all on a single component, rather than relying on separate parts that introduce latency and constrain performance. In recent years, culminating in the current M5 and (soon M6) chips, Apple silicon has become fast enough that a Mac Mini or MacBook Pro equipped with these chips and a generous amount of RAM can run a sizable open-source language model at speeds that are rapidly increasing year over year — and capable of doing many of the things most people want to do with AI.
At its Worldwide Developer Conference last summer, Apple went further by releasing a Foundation Models framework that gives third-party developers access to Apple’s own generative AI models running directly on the device, with no cloud dependency for most tasks. As one commentator puts it , “Apple isn’t just delivering a model — it’s delivering an entire AI operating layer baked into iOS,” with data never leaving the device.
Earl Brandt, an Australian engineer, has described his experiments with Apple’s most capable chips in ways that drive home what this might mean. Using a maxed-out M4 Ultra Mac Studio desktop computer (about as big as a clock radio) at roughly $10,000 US, he was able to run a version of Google’s Gemma model locally. Amortized over three years, he gained access for $280 a month to a model that would have cost $1,500 a month if hosted in the cloud — with, as he notes, no API costs, rate limits, or data leaving his device. In three years, the M8 chip in a MacBook Air that costs $1,000 will do this even more cheaply.
Once again, we will still need data centres and the cloud for the largest frontier models, and much of the AI we’ll use in the near future will involve those models. Notably, in June, when Apple rolls out its new iPhone operating system, it will incorporate Google’s Gemini model to power Siri, hosted on Apple’s remote server system, which the company calls Private Cloud Compute. Large models might also become more important to us if and when more of the AI we use takes the form of agentic AI — software that does things for us, like book a reservation, make a purchase, or deal with customers. But so far, agentic AI, like self-driving cars, seems stuck on the periphery.
An equally plausible near future — the next three to five years — is one in which more of us do much of our AI processing on device, running small models we host. If so, we will still need AI data centres, but how many is uncertain. And the government’s claim that remaining competitive in an AI economy depends on a costly build-out that taxpayers subsidize becomes less convincing. ■
{To receive new posts by email, follow me on Substack .}
