What's the Future of AI Language Models as a Decentralized Technology?

Many assume that LLMs will always be provided by big tech companies through the cloud, but is that true? (LLMs and power series part 3)

Two fingers disconnecting a network cable from a computer

Jay Stanley, Senior Policy Analyst, 红杏视频 Speech, Privacy, and Technology Project

August 25, 2025

When ChatGPT first arrived, it seemed to be an inherently centralized technology. But in the years since then there has been surprising progress in wringing more power out of ever-smaller Large Language Models (LLMs). That has opened the door for useful decentralized versions of the technology by letting people run versions of these models on their own hardware 鈥� purely locally, that is, without needing an internet connection. Many people inherently assume in this age of cloud services that LLMs will continue to be largely accessed through the giant server farms of large tech companies, but that鈥檚 not necessarily so.

Like the question of how accessible LLM base-model training will be, the question of how widespread decentralized local models will be is a key one in assessing the overarching subject of this series: whether LLMs will be an inherently centralized and perhaps authoritarian technology, or something far more democratic.

AI scientists have learned that much smaller models can be compressed out of larger versions and still perform nearly as well. Late last year it became possible to run models as powerful as the GPT-4 on consumer laptops, and other models were made small enough to run on a smartphone. Today the AI code and data repository Hugging Face over 1.9 million models that are free to download and run locally.

The balance of power between centralized, cloud-based LLMs and local models will depend on the outcomes of continuing LLM research, and how models end up being used. It seems common-sensical that a massive amount of compute will always outperform anything that can be run locally. But there are at least four potential developments that could give impetus to local models.

1. The technology plateaus
If the improvements between successive generations of frontier models continue to diminish, or progress just stalls out entirely, the advantages of centralized models will wither away. Already the differences among many models are subtle at best, with different models shining in different areas. That is especially true on tasks where there are no objectively measurable performance metrics (and because models can self-train on such tests, that is where their progress has been the most rapid). And some are already that when it comes to extracting practical benefits from LLMs in real-world situations, progress has already largely stalled, and many argue that the technology has already .

The performance of LLMs on non-objective tasks, like human intelligence, can be very hard to measure. One of the most popular benchmarks for LLMs is , in which site visitors are asked to enter a query and pick their preferred answer from two randomly chosen models, with those evaluations used to rank models. But the evaluations are often simply a matter of taste. One obstacle to benchmarking is 鈥�Goodheart鈥檚 law,鈥� which says that once any performance measure becomes an indicator of progress in a field, it ceases to be useful as an indicator because people start to game it 鈥� for example, by training their LLMs to do well on specific tests.

Insofar as the differences between local models and cloud offerings are subtle rather than stark and obvious 鈥� if people can't really tell the difference 鈥� the advantages of the cloud will shrink.

2. Progress in model efficiency continues
Even if progress in expanding the abilities of frontier models levels off, efficiencies in getting more power out of smaller models may continue. It turns out that bigger is not always better. Elephants have brains three times as large as humans but, as Bill Bryson once , they鈥檙e not going to outwit you in contract negotiations. More and more intelligence is now being wrung out of smaller LLM models, just as the human brain somehow packs more intelligence into a much smaller space than elephants鈥�.

Models can be made more efficient through various techniques.

One is 鈥渒nowledge distillation,鈥� in which a larger 鈥渢eacher鈥� model is used to train a smaller 鈥渟tudent鈥� model. The teacher (base) model has to laboriously pick up the connections among words by scanning vast oceans of data 鈥� gradually learning, for example, that the phrase, 鈥淭hat movie was really ____鈥� might have a 30% chance of being following by 鈥渁mazing,鈥� a 28% chance of being followed by 鈥渢errible,鈥� a 3% chance of 鈥渄isappointing,鈥� and so forth. The teacher model can effectively pass those percentage chances along to the student model, saving the student from having to do all the expensive computation involved in learning the strength of each association.
Another technique is 鈥渜uantization,鈥� in which a model is made lighter by reducing the precision of parameter variables 鈥� for example, from 32 bits to 8 bits. Experience has revealed this often degrades performance surprisingly little.
A third technique is called 鈥渕ixture of experts,鈥� in which a model is sub-divided into specialized parts so that only a portion of the parameters are activated during inference, dramatically improving efficiency. Advances on this front were one of DeepSeek鈥檚 major innovations; their is 671 billion parameters, but during inference it only activates around 37 billion of the parameters, a dramatically higher 鈥渟parsity factor鈥� than any prior models and one that saves huge amounts of compute.

An of the effectiveness of these techniques is the model TinyBERT, which is 7.5 times smaller and 9.4 times faster than its parent model BERTBASE, and performs more than 96% as well. distilled the LlamMa model into a student model 6 times smaller that performed just as well as its parent.

The more that useful locally run LLMs approach the capabilities of those that run in the cloud, the more they become like solar cells rather than nuclear reactors. On the other hand, insofar as it takes data center-scale compute to run the models people want to use, that will be a power-centralizing state of affairs.

3. LLMs prove to be most useful and reliable in smaller, more specialized applications rather than as giant generalists
It鈥檚 possible that specialized models will prove more valuable and cost-effective for the kind of practical, real-world use cases that people and companies are willing to pay for, and the market will shift toward such narrowly trained models. Those models may continue to derive from giant base models, but if progress slows in base model performance even as post-training yields more benefits, then specialization may win out. This question is the of in the world of AI.

4. The demand for privacy and data security drives localization
A vital advantage of local models is privacy: when you chat with a model running in the cloud, the provider can see your queries and any associated documents or other data that you upload. But you may want to ask questions of a chatbot without anybody having the ability to see what you鈥檙e interested in. You may want to feed personal emails, writings, photos, and other documents into an LLM so that you can then use the tool to analyze, search, and interrogate your own data 鈥� and you don鈥檛 want to upload all that information to OpenAI, Google, or DeepSeek鈥檚 servers, no matter what contractual promises they may make about how they鈥檒l use or store that data.

Nor are the privacy threats limited to LLM providers; one security center recently to the Trump Administration that 鈥淭he U.S. government should partner with AI companies to share suspicious patterns of user behavior and other types of threat intelligence.鈥�

Parallel concerns animate companies (concerns I鈥檇 call data security, since privacy is something humans but not corporations possess). There are many people within companies and other institutions who are eager to take advantage of the summarization, search, and analysis capabilities of LLMs, but who regard their data as too sensitive to upload to the servers of a big American or Chinese company. One executive went so far as to 鈥淟arge public models on their own have little to no value to offer private companies.鈥� Already LLM providers are LLM products not only via the cloud, but also as an on-premises installation, and advice and services for enterprises wanting to build in-house LLM platforms all over the internet.

Running your own server
These factors could drive local LLMs to the fore. Still, the models viewed as best today are those that run in the cloud, not those small enough to run on personal hardware. In addition, it has certainly been the case historically that many technologies that are structurally decentralized nevertheless become centralized because, as Signal founder Moxie Marlinspike famously , 鈥減eople don鈥檛 want to run their own servers.鈥� Email, for example, is a completely decentralized technology 鈥� anybody can use one of a vast number of providers, or even become their own provider by standing up their own email server. A few do, and the ability to do so is important as an escape valve 鈥� but the vast majority uses Google鈥檚 Gmail and a couple other centralized services run by big tech firms because it doesn鈥檛 require any expertise and many people just find it easier. And when the quality differences among different products are subtle, ambiguous, and subjective, irrational factors such as brand names, 鈥渧ibes,鈥� and the power of suggestion tend to come to the fore, which could also benefit the likes of Google, OpenAI, and Anthropic.

It also remains the case that the capability of smaller models depends on the size and power of the larger models from which they are derived. And the best of those larger models still require significant (though as we have seen, apparently declining) resources that are not widely available.

Still, there are reasons to think that local LLMs may become an important, co-equal, or even dominant part of the landscape. In the next part of this series, I will look at a closely related factor in the character of LLMs: the vitality of open source models.

UPDATE:
Here's the next post (part 4) in this series.

红杏视频

What's the Future of AI Language Models as a Decentralized Technology?

Learn More 红杏视频 the Issues on This Page

Related Content

Digital Driver鈥檚 Licenses Threaten to Create a 鈥淕reat Internet Lockdown鈥�

Surveillance Supporters Tout Police Audit Logs But They鈥檙e Not an Effective Check and Balance

Biometric bracelets for prisoners

Gun-Toting Police Swarm, Handcuff Young Black Man After AI Mistakes Doritos Bag For a Gun