The 2026 guide to national LLMs, the sovereignty gap, and what GPT-NL reveals about owning your own AI.
The Netherlands spent EUR 13.5 million to build its own ChatGPT, and the result lands somewhere around GPT-3.5, a model OpenAI shipped in late 2022. That single fact captures the entire paradox of sovereign AI. Dozens of governments are pouring public money into homegrown language models so they are not renting their national intelligence from three American labs and one Chinese one. Almost every one of those models comes out a generation or two behind the frontier, and most of them still run on American chips.
This is not a story about incompetence. The teams building these models are world class, and the engineering inside the Dutch GPT-NL project, Switzerland's Apertus, the UAE's Falcon, or India's Sarvam is genuinely impressive. It is a story about structure: about what happens when a country with a EUR 14 million grant tries to compete in a market where a single training run costs more than $100 million, where one private lab raises more than the entire annual research budget of a G7 government, and where the company selling the shovels (NVIDIA) now books over $30 billion a year in "sovereign AI" revenue.
The deeper question is not whether national models can match GPT-5.5. They cannot, and that is the wrong goal. The real question is what sovereignty actually buys a nation, which layers of the AI stack a country can realistically control, and whether "train your own frontier model from scratch" was ever the right definition of independence in the first place. This guide uses GPT-NL as the lens, because it is the most honest, most transparent, and most instructive failure-to-launch-as-a-ChatGPT-rival in the entire field, and because its designers never actually promised a ChatGPT rival.
This guide breaks down what sovereign AI means from first principles, profiles the national models of Europe, the Gulf, and Asia, explains the three structural gaps that cap their capability, examines the defense and government drivers that make sovereignty non-negotiable for some workloads, follows the money, and lays out the approaches that actually deliver independence in 2026. It assumes no technical background.
Contents
- What Sovereign AI Actually Means
- GPT-NL: The Dutch Bet, Up Close
- Why Sovereign Models Stall Near GPT-3.5
- Europe Beyond GPT-NL: Apertus, EuroLLM, and Mistral
- The Gulf: Buying Sovereignty by the Gigawatt
- Asia's Sovereign Wave: India, Japan, Korea, Singapore
- China and the Open-Weight Shortcut
- The Real Driver: Government, Defense, and the Kill Switch
- The Economics: NVIDIA's Windfall and the Funding Mirage
- How a Nation Actually Gets Sovereignty
- The Future Outlook: Agents, GPT-EU, and What Matters
- Conclusion: A Decision Framework
The National Model Scorecard
Before the detail, here is the whole field on one page. The table below scores thirteen national and sovereign model programs on five criteria that matter from first principles: how capable the model is, how much real sovereignty and control it delivers, how durable its funding is, how well it fits its national-language and public-sector mission, and whether it is actually deployed. Each criterion carries a weight, scores run 0 to 10, and the final column is the weighted average. The table is sorted by final score, highest first.
The scoring deliberately weights capability at 30%, which pushes the smallest, purest sovereignty projects toward the bottom. That is the central tension of this guide, not a verdict that those projects are mistakes. A model can score last here and still be the right call for its country, because the thing it sacrifices (frontier capability) may matter less than the thing it protects (lawful control over sensitive data).
| # | Program (Region) | What It Is | Capability (30%) | Sovereignty & Control (25%) | Funding & Durability (20%) | Mission Fit (15%) | Deployment (10%) | Final |
|---|---|---|---|---|---|---|---|---|
| 1 | DeepSeek + Qwen (China) | De-facto national open ecosystem | 9 - DeepSeek V4-Pro 90.1% MMLU, AA Index 52 | 9 - MIT/Apache open weights, domestic labs under export controls | 9 - Alibaba/DeepSeek scale, highly durable | 7 - Chinese + multilingual, needs fine-tune elsewhere | 9 - most-downloaded open family, 40%+ of new HF derivatives | 8.7 |
| 2 | Falcon / TII (UAE) | Open Arabic-first model family | 7 - Falcon-H1 34B 75.4% Arabic OALL, strong reasoning | 8 - permissive open weights, state lab, US chips | 9 - TII + G42 ($1.5B Microsoft), gigawatt compute | 8 - Arabic-first leader, 18 languages | 7 - widely downloaded, G42 ecosystem | 7.8 |
| 3 | EXAONE + HyperCLOVA X (South Korea) | Corporate-national Korean models | 8 - top 32B model on Artificial Analysis mid-2025 | 7 - open-ish weights, domestic labs, sovereign program | 8 - $381M program inside 10.1T won 2026 AI budget | 8 - Korean-first leader | 7 - Naver and LG products | 7.65 |
| 4 | Mistral (France) | Commercial European champion | 7 - Large 3 open MoE 675B/41B, credible frontier lab | 7 - Apache weights, French datacenters, US GB300 chips | 9 - EUR 11.7B valuation, EUR 4B infrastructure plan | 7 - multilingual EU plus French defense | 8 - French military, Airbus, enterprise | 7.5 |
| 5 | ALLaM / HUMAIN (Saudi Arabia) | State Arabic model on PIF compute | 6 - ALLaM 34B powers HUMAIN Chat, tops Arabic (self-reported) | 6 - SDAIA lab, 34B closed, US/AMD chips | 9 - HUMAIN PIF $77B infrastructure, $10B AMD JV | 8 - Arabic-first, 8PB corpus | 6 - HUMAIN Chat live since Aug 2025 | 6.9 |
| 6 | Apertus (Switzerland) | Fully open, fully transparent model | 5 - 70B open, "significantly behind frontier" | 10 - weights, data, and code open; opt-out data; Swiss compute | 6 - CHF 20M plus public supercomputer to 2028 | 8 - 1,000+ languages incl. Swiss German, Romansh | 4 - research-first, chatbot secondary | 6.8 |
| 7 | tsuzumi / Swallow / Fugaku-LLM (Japan) | Japanese-optimized domestic models | 5 - Japanese-tuned; Fugaku-LLM MT-Bench 5.5 | 7 - Fugaku supercomputer, domestic, mixed licenses | 8 - JPY 1T FY2026 scheme plus GENIAC | 8 - Japanese-first, single-GPU domestic option | 6 - NTT enterprise, Swallow open | 6.65 |
| 8 | Sarvam / IndiaAI (India) | Government-designated sovereign LLM | 5 - Sarvam-105B AA Index 18, from scratch | 8 - Apache weights, 4,096 H100s, govt-designated | 7 - $1.25B mission, slow disbursement | 8 - 22 official Indian languages | 5 - early, first national builder | 6.6 |
| 9 | EuroLLM / OpenEuroLLM (EU) | Pan-European open model | 5 - EuroLLM-9B 55.7% MMLU, best open EU on translation | 9 - fully open, EuroHPC compute, EU-funded | 6 - EuroHPC grants plus EUR 37.4M project | 8 - all 24 EU languages | 4 - research, wider release Q3 2026 | 6.55 |
| 10 | SEA-LION (Singapore) | Southeast Asian language model | 5 - built on Gemma/Qwen, strong SEA languages | 6 - open, but foreign base and chips | 6 - S$70M national project | 9 - 11+ SEA languages, regional leader | 6 - Sahabat-AI Indonesia deployment | 6.15 |
| 11 | ALIA / Salamandra (Spain) | Iberian-language open model | 3 - ALIA-40B 51.8% XNLI vs Llama 2's 66% | 9 - Apache open, BSC compute, 35 languages | 7 - EUR 240M+ program | 8 - Spanish plus co-official languages | 3 - limited deployment | 6.05 |
| 12 | Teuken / OpenGPT-X (Germany) | All-EU-language research model | 5 - Teuken-7B mid-50s%, HellaSwag 71.9% | 9 - fully open, from scratch, all 24 EU languages | 4 - EUR 14M, project ended, Aleph Alpha into Cohere | 7 - all EU languages | 3 - research artifact | 5.9 |
| 13 | GPT-NL (Netherlands) | Public-sector sovereign Dutch model | 3 - GPT-3.5-era target, beats GPT-3 on Dutch summaries | 9 - from scratch, licensed/opt-in data, GDPR-first, Dutch compute | 4 - EUR 13.5M, needs 10x for a successor | 9 - Dutch-first, public-sector and forensic | 5 - 5-10 government pilots, H2 2026 rollout | 5.8 |
The criteria break down as follows. Capability (30%) anchors on public benchmarks (MMLU, Artificial Analysis Intelligence Index, language-specific leaderboards) and proximity to the frontier. Sovereignty and Control (25%) rewards open weights, transparent and lawfully sourced data, and training or hosting on independent compute. Funding and Durability (20%) measures the money and whether the program will still exist in three years. Mission Fit (15%) asks whether the model serves its national language and public-sector purpose. Deployment (10%) asks whether anyone actually uses it in production. Notice that capability and sovereignty pull in opposite directions: the most capable entry (China's open ecosystem) is sovereign only for China, while the most sovereign entry (Apertus) sits well below the frontier.
1. What Sovereign AI Actually Means
Strip away the marketing and sovereign AI is a simple claim: a nation, or a bloc, should own and control the artificial intelligence that runs inside its borders, rather than rent it from a foreign company that answers to a foreign government. Ownership here is not one thing. It splits into where the data and compute physically sit (territorial sovereignty), who can switch the system off (operational sovereignty), who holds the intellectual property and model weights (technological sovereignty), and whose laws govern the whole arrangement (legal sovereignty) - a framing laid out cleanly in industry explainers - Red Hat. A country can have some of these and lack others, which is why "sovereign" gets stretched to cover everything from a fully homegrown model to an American cloud with a European board.
The phrase was popularized by one person with one very large commercial interest. At the World Governments Summit in Dubai in February 2024, NVIDIA chief executive Jensen Huang told assembled heads of state that a country's data "codifies your culture, your society's intelligence, your common sense, your history," and that every nation must "own the production of their own intelligence" - NVIDIA. It was a genuinely resonant argument, and it was also a sales pitch, because the intelligence in question is produced on NVIDIA chips. Both things are true at once, and holding them together is the only way to think clearly about this market.
The case for sovereignty is strongest where the cost of dependence is highest. A government cannot put classified case files, tax records, or military planning into a system that a foreign state could subpoena or shut off. A culture cannot let its language and history be represented only through a model trained mostly on English internet text. And a regulator cannot enforce its own rules on a system it does not control. These are not hypothetical concerns in 2026, as the defense and "kill switch" cases in section 8 make concrete. The structural problem is that controlling the top of the stack (the application, the data, the fine-tuning) is achievable for almost any country, while controlling the bottom (the chips, the fabrication, the hyperscale compute) is achievable for almost none.
The scarcity is starker than most strategy documents admit. McKinsey estimates that only about 30 countries currently host in-country compute capable of supporting advanced AI workloads - McKinsey. For everyone else, "sovereign AI" begins with a dependency: you must buy or rent the compute from someone who has it, and the someone is almost always American. This is why the four dimensions of sovereignty rarely arrive together. A country can achieve legal and data sovereignty (its laws govern, its data stays home) while having no technological sovereignty (it owns no chips and trained no model), and the gap between those two is where most national programs actually live. GPT-NL has data and legal sovereignty in abundance and almost no hardware sovereignty, and that combination, not a contradiction, is the realistic shape of the thing.
The diagram makes the practical lesson visible. A nation that defines sovereignty as "we control everything from the sand up" will fail, because only the United States and China operate anything close to a full-stack ecosystem covering chips, fabrication, clouds, and frontier models - Council on Foreign Relations. A nation that defines sovereignty as "we control the model, the data, and the deployment, even if the chips are imported" can actually get there. This is why the sharpest critics call much of the sovereign-AI wave a relabeling exercise: "Sovereignty is the wrapper. Dependence is the contents" - ICTworks. That critique is too cynical to be the whole truth, but too accurate to ignore, and the rest of this guide is largely an attempt to find the line between the wrapper and the contents. Our broader treatment of the chip and cloud layers lives in the AI Sovereignty practical guide.
2. GPT-NL: The Dutch Bet, Up Close
GPT-NL is worth studying in detail because it is the most transparent national model program in the world, and because it refuses to pretend to be something it is not. It is built by a non-profit consortium of three Dutch public institutions: TNO, the national applied-research organization that leads development; SURF, the academic ICT cooperative that runs the supercomputer and will host the model for education and research; and the NFI, the Netherlands Forensic Institute, which brings the demanding use cases of criminal forensics - TNO. The whole thing was funded with exactly EUR 13.5 million, awarded in 2023 by the Netherlands Enterprise Agency on behalf of the Ministry of Economic Affairs and Climate Policy - Computable.
The defining choice of GPT-NL is its data. Where frontier labs scrape the open web and sort out the copyright lawsuits later, GPT-NL trains only on text it has the legal right to use, refusing scraped copyrighted material entirely. The team struck a first-of-its-kind collective licensing deal through the Dutch news-publishers association, bringing in the archives of more than thirty national and regional titles plus the ANP news agency, adding over 20 billion tokens of high-quality Dutch text and roughly doubling the project's premium training data, with publishers compensated through revenue sharing - Silicon Canals. The project leads claim it is the first language model worldwide to comply fully with the GDPR, and it won a Dutch Privacy Award for the effort - MT/Sprout.
The scale of that data discipline is documented in unusual detail. The project's public training corpus, described in an April 2026 research paper, totals roughly 523 billion tokens, of which only about 36 billion are unique Dutch tokens, the rest being curated English, code, and neighboring-language text, all redistributed under permissive licenses that explicitly exclude share-alike sources like Wikipedia and anything with unclear web-crawl rights - arXiv. That is a sliver of the trillions of tokens a frontier model ingests, and it is the data gap of section 3 made concrete. It also explains why Dutch hobbyists took a different path: community models like GEITje and Fietje simply continued-trained an existing open model on Dutch text, reaching usable fluency at a tiny fraction of GPT-NL's cost, precisely because they did not insist on clean-room data provenance - GEITje.
That principled stance has a direct, measurable cost in capability, and the team is refreshingly candid about it. GPT-NL is a deliberately small model (independent analyses put it around 26 billion parameters, though the consortium has not published an official figure), designed to summarize, simplify, and answer questions over documents rather than generate images or code - Upstream. The original feasibility framing set the target at roughly GPT-3.5 level, and the project's own progress reporting in February 2026 stated that on Dutch summarization the model already "outperforms older models such as GPT-3" on benchmarks like EuroEval - GPT-NL. Read those two facts together and the picture is honest: GPT-NL is aiming at, and roughly reaching, the capability tier of a model from late 2022, not the 2026 frontier.
The compute story tells you why. GPT-NL trains on 88 NVIDIA H100 GPUs, a dedicated 22-node allocation on SURF's national supercomputer Snellius in Amsterdam, and the project's own materials draw the contrast explicitly: Meta used on the order of 16,000 H100s to train its open Llama models - GPT-NL. That is a roughly 180-to-1 gap in raw silicon between a national program and a single open-weight release from one American company. Training did not even start on schedule. It began in the summer of 2025, about a year later than first planned, with pre-training completed and announced in February 2026 - MT/Sprout.
Where GPT-NL is genuinely succeeding is in the only arena that justified building it: lawful, sovereign, public-sector deployment. Since late February 2026, five organizations have been running on-premise feasibility pilots of the beta model, with the number targeted to reach ten by spring, and broader rollout planned for the second half of 2026 - Computer Weekly. The pilot users are exactly who you would expect: the Ministry of the Interior is financing three use cases, alongside TNO, the NFI, and parties from the security and financial sectors, each running a closed three-to-six-month study inside their own walls - Binnenlands Bestuur.
The concrete use cases reveal what a GPT-3.5-level sovereign model is actually for. GPT-NL is being trialed to draft plain-language government letters and to improve a municipal chatbot already used by around thirty municipalities that handled tens of thousands of citizen conversations in 2024 - NL Times. None of these tasks needs frontier reasoning; they need a reliable, lawful, Dutch-fluent model that runs inside a government boundary and never sends a citizen's data to a foreign cloud. The operating principle the project describes is "Copilot, unless": use the cheap, capable foreign assistant for ordinary work, and switch to the sovereign model only where the data is too sensitive to leave. That principle, not frontier parity, is the actual product, and it reframes the entire budget debate. The model's R&D lead is blunt about the limits: GPT-NL needs more explicit prompting than larger models, has weaker general knowledge outside its domain, and is "not a replacement for everything that already exists." To build a competitive successor, TNO says it would need at least ten times the budget, and is exploring whether the next step should be a multilingual European model rather than a Dutch one - MT/Sprout.
The honest verdict on GPT-NL is not "it failed." It is "it succeeded at a deliberately modest goal, and the gap between that goal and public expectations is the real story." A country that wanted a sovereign, GDPR-clean, Dutch-language assistant for government letters and forensic document work got exactly that. A public that heard "the Dutch ChatGPT" expected something it was never going to be. That gap, between what these projects can do and what the phrase "national AI model" implies, repeats in every country in this guide.
3. Why Sovereign Models Stall Near GPT-3.5
If one national model landed at GPT-3.5 level, you could call it a budget problem. When nearly all of them cluster in the same place, you are looking at structure. Sovereign models underperform for three compounding reasons, each an order-of-magnitude gap, and understanding them from first principles is the difference between a useful national strategy and an expensive vanity project. The three gaps are compute, data, and capital, and they multiply rather than add.
Start with compute, because it is the hardest wall. The amortized cost of a single frontier training run has been rising about 2.4 times per year and is on track to exceed $1 billion by 2027, with hardware alone accounting for roughly half to two-thirds of development cost - Epoch AI. Frontier clusters are now measured in hundreds of thousands of chips: xAI's Colossus passed 200,000 GPUs and is targeting a million - Data Center Dynamics. Against that, even Europe's flagship public machine, the exascale JUPITER supercomputer, has about 24,000 superchips and cost EUR 500 million, and it is shared across all of European science rather than dedicated to one model - EuroHPC.
The second gap is data, and it is subtler because it interacts with sovereignty directly. Compute-optimal training, the so-called Chinchilla rule, calls for roughly 20 tokens of text per model parameter, which means a strong model needs trillions of high-quality tokens - Hoffmann et al.. High-quality text in any language other than English is comparatively scarce, and the moment a national project commits to using only licensed or opt-in data, as GPT-NL and Apertus both do, it shrinks its own dataset further. Sovereignty and capability are in direct tension here: the cleaner your data provenance, the smaller your corpus, and the smaller your corpus, the weaker your model on everything outside its core domain.
The third gap is capital and talent, and it dwarfs the other two. A national LLM program is funded somewhere between Germany's EUR 14 million OpenGPT-X and Spain's EUR 240 million ALIA umbrella - Fraunhofer IAIS. In the same window, Anthropic raised roughly $65 billion at a near-trillion-dollar valuation, and OpenAI's valuation sat in the hundreds of billions - Axios. That is a thousand-to-ten-thousand-fold gap, and the same money also out-bids every government for the few hundred researchers who can lead a frontier training run. A useful way to feel the scale: a single Anthropic round is around twenty times the entire annual non-defense federal AI research budget of the United States - Brookings.
There is a fourth gap that is political rather than technical: access to the chips at all. US export controls decide which nations can even buy frontier hardware, and the rules have whipsawed, from a tiered framework capping many countries' GPU imports to a 2025-2026 reversal allowing conditional high-end sales to China with a revenue fee - CFR. For a country outside the inner circle of US allies, the supply of accelerators is a policy variable set in Washington, which means even a nation willing to spend frontier money may not be permitted to buy frontier silicon. Compute sovereignty, in other words, is not only a question of budget; it is a question of permission, and permission can be revoked.
The result is visible in the benchmarks, and it is remarkably consistent. The clearest reference point is MMLU, a 57-subject knowledge test where human experts score about 90%. GPT-3.5, released on 30 November 2022, scored 70.0%, and GPT-4 a few months later scored 86.4% - GPT-4 Technical Report. Europe's strongest fully-open sovereign model, EuroLLM-9B, scores about 55.7% on the EU-language average, more than ten points below the comparable open reference model Gemma-2-9B - EuroLLM. In other words, the typical sovereign model sits below even GPT-3.5, while the 2026 frontier has saturated the test entirely.
The saturation itself is worth dwelling on, because it changes what "behind" means. By 2026, every frontier model exceeds roughly 88% on MMLU, and the benchmark is so saturated that labs now differentiate on harder tests like graduate-level reasoning and real software engineering instead - Kili Technology. A national model scoring in the mid-50s is therefore not "a little behind"; it is on the far side of a capability cliff the frontier crossed years ago, competing on a metric the leaders have abandoned. This is the uncomfortable arithmetic behind the phrase "roughly GPT-3.5 level": it does not mean "a 2026 model that is slightly weaker," it means "a model at the capability of late 2022, when ChatGPT first launched," which in a field moving this fast is an eternity. The honest framing is that sovereign-by-construction models trade three to four years of capability for control, and whether that trade is worth it depends entirely on the workload.
The cost side of the same picture explains why this is so hard to escape. The money a nation spends on the model itself is a rounding error next to a frontier run. GPT-NL and Germany's Teuken cost around $15 million each; even DeepSeek's headline final-run figure was about $5.6 million; a single GPT-4-class run is reportedly north of $100 million, and Google's Gemini Ultra was estimated at $191 million - Stanford AI Index summary. The numbers below put the gap in one frame. The lesson is not that nations are underfunding their models. It is that "fund the model" was never the expensive part, and treating frontier parity as the goal guarantees disappointment. Our explainer on how LLMs actually work walks through why scale, not cleverness, drives most of the capability difference.
4. Europe Beyond GPT-NL: Apertus, EuroLLM, and Mistral
Europe is the most crowded sovereign-model arena, and it splits cleanly into three tiers that every other region echoes in miniature. The first tier is academic and state-funded, fully open, and trained on public supercomputers. The flagship here is Apertus, released on 2 September 2025 by EPFL, ETH Zurich, and the Swiss National Supercomputing Centre. It comes in 8 billion and 70 billion parameter sizes, was trained on 15 trillion tokens across more than a thousand languages with 40% non-English data, and is open in the fullest sense: weights, training data, and methods are all public - ETH Zurich. It is also, by its own framing, "significantly behind the capabilities of frontier models," which is exactly what the three-gap analysis predicts for a fully compliant, publicly-funded build.
The same tier includes the pan-European effort. EuroLLM-22B, released in December 2025, supports all 24 official EU languages and was trained from scratch on the MareNostrum 5 supercomputer using just 400 H100 GPUs under a EuroHPC grant; it is billed as the best fully-open European-made model and matches or beats larger open models on translation - EuroLLM team. Germany's Teuken-7B, the output of the EUR 14 million OpenGPT-X project, trained from scratch in all 24 EU languages and lands in the mid-50s percent capability tier - Teuken report. Spain's ALIA-40B is the cautionary tale of the group: despite a program funded at over EUR 240 million, independent testing found it scoring 51.8% on the XNLI reasoning benchmark against Llama 2's 66%, and worse than random guessing on some reasoning tasks - ScienceDirect. Money alone, it turns out, does not buy capability if the compute and data gaps remain.
The shared backbone under all of these is EuroHPC, the EU's joint supercomputing program, and its scale clarifies why European models stay small. The flagship JUPITER machine reached exascale in 2025 with about 24,000 superchips, joined by LUMI in Finland (over 10,000 AMD GPUs), Leonardo in Italy (around 14,000 NVIDIA A100s), and MareNostrum 5 in Spain - EuroHPC. These are genuinely powerful machines, but they are scientific facilities shared across climate modeling, genomics, and physics, not dedicated single-model training clusters, and EuroLLM-22B trained on just a 400-GPU slice of one of them. The structural point is blunt: a continent's flagship public compute is roughly a tenth the size of a single American lab's private cluster, which is exactly why the European answer is shifting from each nation building alone toward pooling resources across the bloc.
These models share a quiet strength that the benchmarks miss: they preserve languages and dialects the frontier labs have no commercial reason to serve well. Apertus handling Swiss German and Romansh, EuroLLM covering all 24 EU languages, Greece's Meltemi improving on its base model by nearly 15% on Greek tasks - ILSP, and Poland's Bielik topping the sub-20B Polish leaderboard - Bielik report are real public goods. The mistake is to score them as failed ChatGPT competitors rather than as successful linguistic and public-sector infrastructure. They are doing a different job, and on that job they are winning.
The second European tier is the commercial champion: Mistral, France's frontier-adjacent lab. It raised EUR 1.7 billion in September 2025 at an EUR 11.7 billion valuation, with the chip-equipment giant ASML taking an 11% stake, and it is building French datacenters as part of a EUR 4 billion infrastructure plan - CNBC. Mistral is the rare European model with genuine market traction, a French military framework agreement, and an Airbus partnership for sovereign aerospace AI. But its open-weight Mistral Large 3 still scores well below the best Chinese open models on composite benchmarks, and its datacenters run on NVIDIA chips, so it embodies the wrapper-versus-contents tension rather than resolving it. The third tier is consolidation: Germany's Aleph Alpha abandoned foundation models for enterprise software and is merging into Canada's Cohere, while Finland's Silo AI was absorbed by AMD - Sifted. For the full macro picture of European AI strategy and infrastructure, our Europe AI Awakening guide and the EU AI infrastructure analysis go far deeper than there is room for here.
5. The Gulf: Buying Sovereignty by the Gigawatt
The Gulf states approach sovereign AI with one advantage no European ministry has: sovereign wealth measured in the trillions, and a willingness to spend it on compute at a scale that embarrasses everyone else. The result is a distinctive model. Where Europe builds small models on shared public supercomputers, the Gulf builds gigawatt-scale clusters in partnership with American chipmakers and clouds, then trains capable Arabic-first models on top. The strategy treats compute as something you buy, not something you ration.
The UAE has the most mature program. The Technology Innovation Institute (TII) in Abu Dhabi has shipped Falcon models since the 180-billion-parameter Falcon-180B in 2023, and its current line is the hybrid Mamba-Transformer Falcon-H1 family, extended in January 2026 with Falcon-H1 Arabic models in 3B, 7B, and 34B sizes - TII. On the Open Arabic LLM Leaderboard the 34B model averages 75.36%, and TII's separate Falcon-H1R-7B reasoning model claims to out-reason systems up to seven times its size - HPCwire. Alongside TII, G42's Inception lab runs the Jais Arabic line, now at the 70-billion-parameter Jais 2 - MBZUAI.
The compute behind this is the real story. Stargate UAE, announced in May 2025, is a 1-gigawatt cluster in Abu Dhabi operated by OpenAI and Oracle inside a planned 5-gigawatt US-UAE campus, with G42 committing up to $20 billion and the first 200 megawatts due online in 2026 - G42. Saudi Arabia is moving on a similar axis through HUMAIN, a Public Investment Fund company chaired by the Crown Prince, which signed a partnership with NVIDIA for up to 500 megawatts powered by several hundred thousand GPUs, plus a separate $10 billion collaboration with AMD - NVIDIA. On the model side, Saudi Arabia's SDAIA built the ALLaM family, whose 34B variant powers the consumer HUMAIN Chat app launched in August 2025 - Middle East AI News. Qatar rounds out the region with Fanar, upgraded to a 27-billion-parameter multimodal model in December 2025 - Middle East AI News.
The geopolitics underneath these deals is as important as the silicon. Microsoft invested $1.5 billion in G42 in 2024, with a board seat and an intergovernmental assurance agreement between the UAE and US governments, a structure built to keep advanced chips flowing while satisfying American security concerns - Microsoft. The dependency runs in surprising directions: Elon Musk's xAI is reported to be an early customer of NVIDIA-backed Saudi data centers, meaning an American frontier lab will train on Gulf-financed, American-chip-powered, Saudi-hosted infrastructure - CNBC. This is sovereignty as a web of mutual dependence rather than independence, and it is deliberate: the Gulf is using its capital to make itself indispensable to the American AI supply chain, which is a different and arguably shrewder goal than building a self-sufficient national stack.
The Gulf model is the clearest test of whether money can buy sovereignty, and the answer is "partly." The clusters are real, the Arabic models are genuinely the best in their language, and the sovereign-wealth funding is durable in a way no grant program can match. But the chips are American, the clouds are operated by OpenAI and Oracle, and the entire arrangement rests on US export approvals that can change with an administration. The Gulf has bought operational scale and linguistic leadership; it has not bought independence from the American stack, and it has arguably deepened that dependence by building its national infrastructure directly on it.
6. Asia's Sovereign Wave: India, Japan, Korea, Singapore
Asia is running the largest and most varied sovereign-AI experiment on earth, and the striking pattern is that the biggest programs fund compute and grants rather than a single national model. The governments build the runway; private labs and research institutes build the planes. This is a structurally smarter division of labor than the European "one ministry, one model" approach, and it is producing a wider range of outcomes.
India's IndiaAI Mission is the template. Approved in March 2024 with an outlay of Rs 10,371.92 crore (about $1.25 billion) over five years, it pools tens of thousands of subsidized GPUs and offers them to startups, with the largest single allocation going to compute rather than models - PIB. Within that, the government designated Sarvam AI as the first builder of a sovereign foundation model, granting it access to around 4,096 H100 GPUs - Inc42. Sarvam delivered, releasing the open-weight Sarvam-30B and Sarvam-105B mixture-of-experts models in March 2026 - Sarvam. On the Artificial Analysis Intelligence Index those score 12 and 18 respectively, which is real progress from scratch but still a long way below the frontier, and a reminder that designation and compute do not instantly close the capability gap. India's disbursement has also lagged badly, with only a fraction of the five-year outlay released in the first two years - MediaNama.
India's program also shows how quickly sovereign-AI ambition collides with reality. The mission widened beyond Sarvam to designate additional builders, including BharatGen, a consortium of Indian institutes that secured the largest foundation-model allocation to work toward a trillion-parameter model - The Tribune. At the same time, the country's first GenAI unicorn, Krutrim, pivoted away from foundation models and chip design toward AI cloud services in 2026, conceding that competing at the model layer was harder than the funding implied - TechCrunch. The pattern of "designate a national champion, then watch it confront the three gaps" is one every country in this section is living through in real time.
Japan, Korea, and Singapore each run a different play. Japan approved its first National AI Basic Plan in December 2025, including a JPY 1 trillion (about $6.3 billion) five-year sovereign-AI scheme, layered on the GENIAC compute-subsidy program; its models, from Fugaku-LLM to NTT's single-GPU tsuzumi 2 to the Tokyo-Tech Swallow series, are Japanese-optimized rather than global-frontier - NTT. South Korea ran a "Sovereign AI Foundation Model" contest that narrowed five consortia toward two finalists, backed by roughly $381 million inside a record 2026 AI budget; LG's EXAONE and Naver's HyperCLOVA X are the standouts, with EXAONE briefly the highest-scoring 32B model on public leaderboards in mid-2025 - KED Global. Singapore's SEA-LION, from AI Singapore, takes the most pragmatic route of all, fine-tuning open Gemma and Qwen bases to cover 11-plus Southeast Asian languages - SEA-LION.
The corporate dimension is what gives Japan and Korea more room than most. Japan's push leans on deep private labs, from NTT's enterprise-focused tsuzumi to the well-funded Sakana AI, which reached a $2.65 billion valuation in late 2025 as the country's most valuable startup - TechCrunch. Korea's contest, meanwhile, is a brutal demonstration of how thin the field really is: of five consortia selected, two (including Naver Cloud) were eliminated in the first evaluation round in January 2026, concentrating funding on the survivors - Dataconomy. Even in countries with serious industrial AI capacity, the number of teams that can credibly train a competitive model is small enough to count on one hand, which is itself a form of the talent gap from section 3.
The Asian wave teaches the clearest lesson in this guide. The countries getting the most value are not the ones trying hardest to train a frontier model from scratch; they are the ones funding compute infrastructure and letting capable local labs choose whether to build or to fine-tune. Singapore building on Gemma and Qwen, and Indonesia's Sahabat-AI building on SEA-LION, will likely deliver more usable national AI per dollar than any from-scratch sovereign run, precisely because they spend their scarce resources on the layers they can actually control. Korea and Japan, with deep corporate AI labs already, can afford a more ambitious build; smaller nations almost always cannot, and the smartest of them have stopped pretending otherwise.
7. China and the Open-Weight Shortcut
China reframes the entire sovereign-AI debate, and ignoring it produces a badly distorted picture. China's labs are state-aligned but commercially driven, and they have done something no Western sovereign program has managed: ship open-weight models that sit within a handful of points of the best closed American systems, cheaply, and under export controls designed to stop exactly this. DeepSeek-V3 reported a final training-run cost of about $5.6 million, and its V4 generation, released in April 2026 with a 1.6-trillion-parameter flagship, scores roughly 90% on MMLU - DeepSeek. Alibaba's Qwen has become the most-downloaded open model family in the world, with Chinese open models collectively overtaking US ones in global download share for the first time in 2025 - MIT Technology Review.
The breadth of the Chinese open ecosystem is what makes it so consequential for other nations. Beyond DeepSeek and Qwen, Moonshot's Kimi K2 reached roughly a trillion parameters and topped some agentic benchmarks against the best closed models, while Zhipu's GLM and MiniMax added further strong open releases, all under permissive licenses - VentureBeat. By mid-2025, Qwen derivatives alone made up more than 40% of new open language-model variants on Hugging Face. The irony for policymakers is sharp: US export controls meant to slow Chinese AI instead pushed Chinese labs toward efficiency and open release, and the open weights they produced became the default substrate other nations now use to build their own "sovereign" systems. A control regime designed to contain Chinese capability ended up distributing it globally for free.
The strategic consequence is that "sovereignty" for most nations is being quietly redefined. The old definition was "train a frontier model from scratch." The new, achievable one is "download a strong open-weight model, fine-tune it on your national language and data, and host it domestically so you depend on no foreign company's interface." A government that runs Qwen or DeepSeek or Llama on its own servers has operational and data sovereignty over that deployment, even though it did not train the base model. Bulgaria's INSAIT did this for Balkan languages; Singapore and Indonesia did it for Southeast Asian ones - Tony Blair Institute. This is far cheaper, far faster, and produces a far more capable result than any sub-100-million-dollar from-scratch run.
This is the awkward truth the from-scratch sovereign programs rarely state plainly: in pure capability-per-dollar terms, a nation is almost always better off fine-tuning a top open-weight model than training its own. The open-weight ecosystem has become the realistic substrate of sovereign AI, and most of the strongest links in it are now Chinese. Our DeepSeek V4 deep dive covers the technical and geopolitical dimensions of that shift, and the May 2026 model benchmarks place the open and closed leaders side by side.
There is a real catch, and it is the reason from-scratch programs still exist. A nation that builds its sovereignty on a Chinese open model has swapped one foreign dependency for another, and an open weight can carry values, refusals, and blind spots baked in at training time that fine-tuning cannot fully remove. For a municipal chatbot this is a non-issue; for a defense or intelligence system it may be disqualifying. That is precisely why a country like the Netherlands, with the NFI's forensic work in mind, chose the slow, expensive, fully-controlled path of GPT-NL despite the capability penalty. The open-weight shortcut is the right answer for most workloads and the wrong answer for the most sensitive ones, and a serious national strategy needs both.
8. The Real Driver: Government, Defense, and the Kill Switch
Strip away the economic competitiveness rhetoric and the hardest driver of sovereign AI is legal and military, not commercial. The structural root cause on the non-US side is a single, well-documented problem: the US CLOUD Act compels American-controlled providers to hand data to US authorities regardless of where it is stored, and FISA Section 702 permits warrantless surveillance of non-US persons. This collides directly with European law, and it is not theoretical. On 10 June 2025, Microsoft France's own legal counsel testified under oath to the French Senate that he could not guarantee French citizens' data would never be transferred to US authorities - The Register.
The legal conflict has no clean resolution, which is the entire point. The CLOUD Act reaches any provider under US jurisdiction, while the EU's Data Act, applicable since September 2025, requires providers to prevent unlawful third-country access to European data, so an American hyperscaler operating in Europe faces two legal regimes that contradict each other - Cybervize. No contract, data-residency promise, or "European cloud" branding fully escapes this, because the obligation follows the parent company's nationality, not the server's location. That is the precise gap a sovereign model fills: not because it is more capable, but because it is the one deployment a foreign government has no legal hook into. For the most sensitive workloads, that property is the entire value proposition, and it is why GPT-NL's modest capability is beside the point for the institutions piloting it.
The abstract risk became a concrete "kill switch" the same year. After the US sanctioned the International Criminal Court's chief prosecutor, he lost access to his Microsoft-hosted email, and by October 2025 the ICC was moving off Microsoft 365 to a German open-source stack - The Register. For every European official watching, the lesson was unmistakable: a foreign-controlled system is a system that can be switched off by a foreign government during a dispute. No amount of contractual assurance survives that demonstration, and it is the single most powerful argument for sovereign infrastructure that exists.
The US itself proves the point by acting exactly the same way in reverse. It does not put classified work on foreign systems; it builds dedicated sovereign stacks at home. In 2025 the Pentagon's digital and AI office awarded contracts with a $200 million ceiling each to Anthropic, Google, OpenAI, and xAI for agentic AI, alongside OpenAI for Government and Anthropic's classified-cleared Claude Gov models - Breaking Defense. And the US system showed its own sovereignty politics when, in February 2026, the administration directed agencies to stop using Anthropic over the company's refusal to permit mass surveillance and autonomous lethal weapons, prompting Anthropic to sue - Euronews. Sovereignty, control, and dependency are live political questions even for the country that owns the whole stack.
The American sovereign stack is more built-out than anything in Europe, which is itself instructive. Anthropic, Palantir, and AWS made Claude available inside an accredited Impact Level 6 environment supporting data up to the SECRET classification as far back as 2024, and OpenAI later secured federal authorizations and self-hosting paths reaching the highest controlled tiers - Businesswire. The United States solved its own sovereignty problem not by forcing the government to train a national model, but by requiring frontier labs to deploy inside government-controlled, classification-cleared enclaves. That is a path others could copy more cheaply than building from scratch: insist that any model touching sensitive data run inside a sovereign-controlled boundary, regardless of who trained it. Sovereignty of deployment, it turns out, is often more attainable than sovereignty of creation.
Europe's defense response is to build sovereign-by-construction systems. France notified Mistral of a Ministry of the Armed Forces framework agreement in December 2025, giving the military access to foundation models hosted entirely on French infrastructure - Army Recognition. The defense-AI startup Helsing, with thousands of strike drones deployed in Ukraine, raised at a EUR 12 billion valuation and then higher - Helsing. And GPT-NL, with its forensic NFI use cases and "use the foreign copilot, unless the data is too sensitive" posture, is the civilian-government version of the same logic. This is the part of sovereign AI that is not a vanity project. When the workload is classified case files or military planning, a GPT-3.5-level model you fully control is worth more than a frontier model you do not, and that calculus is what keeps the expensive from-scratch programs alive.
9. The Economics: NVIDIA's Windfall and the Funding Mirage
Follow the money and the structure of sovereign AI becomes obvious: it is a buyer's market for governments and a seller's market for one company. NVIDIA broke out "sovereign AI" as a distinct revenue line only recently, and it has exploded. The company's sovereign-AI revenue more than tripled year over year to over $30 billion in its fiscal 2026 (ended January 2026), up from roughly $10 billion the prior year, driven by government-backed customers in countries like Canada, France, the Netherlands, Singapore, and the UK - Dealroom. McKinsey projects the broader sovereign-AI market could reach $500-600 billion by 2030 - McKinsey.
The national investment numbers sound enormous until you compare them to a single US buildout. The EU's InvestAI initiative aims to mobilize EUR 200 billion, including a EUR 20 billion public fund for AI gigafactories - European Commission. France touted EUR 109 billion at its 2025 summit, much of it private and including EUR 50 billion from the UAE - CNBC. Saudi Arabia's HUMAIN is pursuing a $77 billion infrastructure plan - CNBC. And yet a single US project, Stargate, plans $500 billion over four years - OpenAI. The chart below puts these on one axis, and the takeaway is stark: the largest national commitment on earth is a fraction of one American buildout.
There is a deflationary force cutting the other way, and it complicates the whole picture. The cost of training a capable model is collapsing even as the cost of frontier infrastructure explodes. DeepSeek's reported $5.6 million final run, whatever its true all-in cost, proved that strong capability no longer requires a nine-figure budget, and that fact undercuts the rationale for many bespoke national training projects - DeepSeek. The two curves point in opposite directions: building the model gets cheaper every year, while building the compute to run frontier workloads at national scale gets more expensive. A government reading only the second curve concludes it must spend hundreds of billions; a government reading the first concludes it should spend almost nothing on the model and fine-tune an open one instead. Both readings are correct, which is why national strategies look so incoherent: they are responding to two genuine but opposite trends at once.
The funding mirage is the gap between these two columns. National "AI investment" figures are dominated by infrastructure and private capital, and the slice that actually funds the model (the thing branded as "sovereign") is tiny: India's foundation-model pillar is about $235 million, and most European models cost tens of millions. The hundreds of billions flow to data centers and the chips inside them, which means most "sovereign AI" spending ends up as revenue for NVIDIA and the American clouds. A government can announce a EUR 200 billion strategy and still find that its actual national model runs on 88 imported GPUs. Reconciling the headline numbers with the model-line reality is the single most clarifying exercise in this whole field, and our EU AI investment analysis traces exactly where the money lands.
10. How a Nation Actually Gets Sovereignty
The practical question every government should ask is not "how do we build a frontier model" but "which kind of sovereignty does each workload actually need, and what is the cheapest way to get it." There are three routes, and the mistake almost every national program makes is treating them as a ranking (with from-scratch at the top) rather than as a portfolio matched to use cases. The right answer is usually all three at once, applied to different workloads.
Route one, training from scratch, makes sense only for the small set of workloads where data provenance and full control are non-negotiable: classified government work, forensics, anything where an inherited bias or a foreign refusal in the base model is a real risk. This is GPT-NL's and Apertus's lane, and it is worth the capability penalty precisely there and almost nowhere else. Route two, fine-tuning an open-weight model on national language and data, is the workhorse for the vast majority of public-sector needs, because it captures most of the sovereignty (you control the deployment, the data, and the fine-tune) at a fraction of the cost. Route three, sovereign-hosting a strong open model like Qwen, DeepSeek, or Llama on domestic infrastructure, delivers near-frontier capability for general productivity work where the base model's origin is acceptable.
A concrete comparison makes the trade-off legible. Spain spent over EUR 240 million on a program whose flagship ALIA-40B underperforms a two-year-old Llama model on basic reasoning, while Singapore's SEA-LION, built by fine-tuning open Gemma and Qwen bases for a fraction of that, delivers genuinely useful Southeast-Asian-language capability in production. The difference is not talent or effort; both teams are strong. It is that one chose route one for a workload that did not require it, and the other chose route two and spent its budget on the layers it could actually control. Multiply that single decision across a national portfolio and the cost of picking the wrong route for each workload runs into the hundreds of millions, which is why the framework matters more than the engineering.
There is also a layer above all three that is easy to overlook: the orchestration that turns a model into useful work. A sovereign model on its own does nothing; value comes from connecting it to data, tools, and workflows, and that orchestration layer is portable across models in a way the model itself is not. This matters strategically because it means a nation or organization can keep its agent and workflow layer constant while swapping the model underneath as sovereignty requirements change. Platforms in this space, including model-agnostic orchestration systems like O-mega, let an organization run an autonomous workforce of AI agents on top of whichever model it controls, so the sovereignty decision collapses to a single question (which model sits underneath) while the agents, tools, and processes stay the same. Our guide to multi-agent orchestration explains why that layer increasingly carries more of the practical value than the raw model does.
The decision rule that falls out of this is clean. Reserve from-scratch builds for the few workloads that legally or operationally demand them, fine-tune open weights for the broad middle, sovereign-host a frontier-open model for general capability, and standardize the orchestration layer so you can move between all three. A nation that does this gets more real sovereignty, and far more usable AI, than one that sinks its entire budget into a single from-scratch model that lands at GPT-3.5 and serves one language. For organizations rather than nations, the same logic applies, and our guide to open-source personal AI walks through the self-hosting end of it.
11. The Future Outlook: Agents, GPT-EU, and What Matters
Three forces will reshape sovereign AI over the next two years, and they push in different directions. The first is the rise of strong open-weight models, which keeps lowering the capability floor a nation can reach without training anything. As DeepSeek, Qwen, Mistral, and Apertus improve, the marginal value of a from-scratch national model keeps falling, because the gap between "what you can download for free" and "what you can build for fifteen million euros" keeps widening in favor of downloading. This is the single biggest reason the from-scratch model, GPT-NL included, looks increasingly like a transitional artifact rather than a long-term strategy for most countries.
The second force is consolidation into multinational efforts. GPT-NL's own leadership has said the realistic next step is not a bigger Dutch model but a multilingual European one, a possible "GPT-EU," and the EU is funding exactly this through OpenEuroLLM and the EuroHPC AI Factories - GPT-NL. Pooling compute and data across a bloc is the only way the structural gaps in section 3 get smaller, because it is the only way the numbers approach frontier scale. Expect the national model to give way to the bloc model, with individual countries contributing language data and compute rather than each building a full model alone. The same pooling logic underlies Europe's broader institutional push, which our EU Inc guide covers from the corporate-structure angle.
The third force is the shift of value from models to agents. As frontier capability commoditizes, the differentiator becomes how well an organization orchestrates models into reliable, auditable work, and that is a layer where sovereignty is both achievable and underexploited. This is the throughline of the work that Yuma Heymans (@yumahey), the founder of O-mega and co-founder of the recruitment platform HeroHunt.ai, has been writing about: as autonomous AI agents take on more real workforce tasks, the strategic question stops being "which lab made the smartest model" and becomes "do you control the model your agents actually run on." Sovereignty, in that frame, is less about training and more about owning the operational layer that turns intelligence into outcomes. Anyone planning a national or enterprise AI strategy should weigh the latest frontier capabilities, captured in resources like the Claude Opus 4.8 benchmarks, against what they can sovereignly control, because the gap between those two numbers is the real budget decision.
The honest forecast is that pure from-scratch sovereign models will become rarer, not more common, outside the handful of countries (and the handful of classified workloads) that genuinely need them. The future of national AI is a layered one: imported chips at the bottom, open-weight models in the middle, fine-tunes and sovereign hosting for control, and a portable agent layer on top where most of the value and most of the achievable sovereignty actually live. The nations that understand this will quietly get more independence than the ones still chasing a homegrown ChatGPT, and they will spend a tenth as much doing it.
12. Conclusion: A Decision Framework
The hard lesson of GPT-NL, and of every national model in the scorecard, is that "build your own frontier model" was the wrong goal disguised as the obvious one. The Netherlands did not fail by landing near GPT-3.5; it succeeded at a deliberately modest, fully-controlled, GDPR-clean public-sector model, and the only failure was the gap between that real achievement and the "Dutch ChatGPT" the public imagined. Across the field, the pattern holds: the most capable "sovereign" option (China's open ecosystem) is sovereign only for China, the most sovereign option (Apertus) sits well below the frontier, and the largest national budget on earth is a fraction of one American buildout. Sovereignty and frontier capability are in genuine tension, and no amount of money fully resolves it.
So the decision framework is about matching the route to the need. If your workload is classified or forensic and a foreign model's biases or kill-switch risk are unacceptable, train or fully control a small model and accept the capability cost, the way GPT-NL and France's military stack do. If your workload is general public-sector productivity, fine-tune a strong open-weight model on your language and host it domestically, the way Singapore and Bulgaria do, and capture most of the sovereignty for a tenth of the cost. If you need frontier capability for non-sensitive work, sovereign-host a top open model rather than building one. And in every case, standardize the agent and orchestration layer so you can move between models as your sovereignty requirements and the open-weight frontier both keep shifting.
The phrase "sovereign AI" will keep being sold by the companies that benefit most from selling it, and the line between the wrapper and the contents will keep being blurred on purpose. The antidote is to ask, for every workload, which specific kind of control you actually need and what the cheapest path to it is. A nation that does that will end up more independent than one that spends a fortune to land at GPT-3.5 in a single language, and it will have learned the real lesson of the Dutch experiment: owning your AI is not about training the biggest model. It is about controlling the layers that matter for the work you actually do.
This guide reflects the sovereign AI landscape as of June 2026. Model versions, funding figures, and national programs change rapidly, verify current details before making strategic or procurement decisions.