The first-principles guide to whether 3D worlds replace the web, who is actually building them, and what you can truly own in 2026.
Meta burned roughly $83.6 billion building a metaverse almost nobody visits, while a 3D world that 132 million people enter every single day quietly runs in a browser tab. That single contrast is the whole question. The headline version, "virtual worlds are the new websites," has been promised at every tech conference since Facebook renamed itself in 2021. It was wrong then. The interesting part is that a much narrower version of it became quietly, structurally true in 2026, and almost nobody is framing it correctly.
Here is the problem with the question as usually asked. People hear "virtual world" and picture a headset, an avatar, and a walled corporate metaverse. They hear "website" and picture a flat page of text and buttons. Then they argue about whether the first replaces the second, as if it were a single yes-or-no event. That framing is why the debate keeps producing bad predictions. A website is not a flat page. It is an open, linkable, install-free addressable thing that anyone can publish and anyone can visit at near-zero cost. The right question is not "will 3D replace 2D," it is "what part of that definition can a generated, explorable 3D space now inherit, and what part can it not."
This guide answers that from first principles, then layers on the 2026 data. It covers what a virtual world actually is when you strip it to bedrock, why the $83 billion metaverse failed while screen-based worlds thrive, what changed under the hood (WebGPU, Gaussian splatting, a real GPU in every browser), the generative world models that build a world from a sentence, the web 3D engines AI now writes against, the asset and capture supply chain, how AI agents collapsed who can build a world, and a concrete 2026 playbook for building one you actually own. Assume no prior 3D knowledge. Every platform, price, and limitation below is from late 2025 or 2026, because in this field anything older is already history.
Contents
- The 2026 world-building scorecard
- The question, restated from first principles
- The anatomy of a virtual world
- Why the $83 billion metaverse failed
- The worlds that actually won
- What changed under the hood
- Generative world models: ownable versus streamed
- The web 3D engines AI writes against
- The asset and capture supply chain
- How AI agents changed who can build a world
- The verdict: complement, not replacement
- The 2026 playbook for an ownable web world
1. The 2026 world-building scorecard
Before the deep dives, here is the whole landscape scored on a single axis: how well does each approach let you build an explorable world that behaves like a website should. That means a world you can own as a file, serve on the open web without an install or a headset, that is real and shipping today, that is affordable to start, and that stays consistent and persistent rather than dissolving the moment you stop looking. Those five properties are exactly what made the document web win, so they are the honest test for any successor.
This is deliberately not a "which world looks most impressive" ranking. By that measure Google's streamed worlds would top the list. It is a ranking by the web's own founding properties, and the result is counterintuitive on purpose: the boring open-source web stack outscores the most hyped generative model, because the hyped model gives you nothing to keep. Each cell carries the score and the specific reason for it. The criteria and weights are explained directly below the table, and the rows are sorted by final score, highest first.
| # | Approach | Category | Ownership (25%) | Web-native (25%) | Maturity (20%) | Accessibility (15%) | Persistence (15%) | Final |
|---|---|---|---|---|---|---|---|---|
| 1 | Three.js + React Three Fiber | Web engine / code | 10 - MIT source, plain JS/TS files you own | 10 - the canonical web 3D, ~98% device reach | 10 - r184, ~113k stars, production standard | 7 - free, but you must write or generate code | 8 - you own all state and storage | 9.3 |
| 2 | PlayCanvas + SuperSplat | Web engine + splat | 9 - MIT engine, MIT editor, files you own | 9 - WebGPU splat renderer + WebGL2 fallback | 8 - v2.19, shipping, smaller corpus | 8 - editor-driven, SuperSplat is no-code | 8 - self-host your own files | 8.5 |
| 3 | World Labs Marble | Generative (ownable) | 9 - exports SPZ/PLY splat + GLB mesh on paid tiers | 9 - browser-based, Spark renderer on Three.js | 8 - GA Nov 2025, real API, $1B funded | 7 - free 4 worlds, then $20 to $95/mo | 8 - the world is a durable exported artifact | 8.4 |
| 4 | Babylon.js | Web engine | 9 - Apache-2.0, owned files | 9 - WebGPU + WebGL2, Gaussian splat support | 8 - v9, Microsoft-backed | 6 - bigger API, thinner AI training corpus | 8 - self-host | 8.2 |
| 5 | A-Frame + model-viewer | Declarative web | 10 - MIT, just HTML you own | 9 - runs everywhere, AR via WebXR | 7 - stable but slower-moving | 8 - HTML tags, lowest barrier | 5 - lower ceiling for stateful worlds | 8.1 |
| 6 | NVIDIA Cosmos 3 | Open world model | 8 - open weights, OpenMDW license | 5 - OpenUSD output, convert for web | 8 - launched June 2026, NVIDIA-backed | 5 - needs heavy GPU compute | 8 - explicit USD artifacts | 6.8 |
| 7 | O-mega | AI workforce / code | 8 - emits code and files you own | 7 - targets the web stack | 5 - general agent platform, not a 3D tool | 8 - prompt-driven, non-technical | 5 - depends what the agents build | 6.7 |
| 8 | Rosebud AI | Vibe-coded web 3D | 4 - inspectable Three.js but hosted, lock-in | 8 - browser-native, instant deploy | 6 - prototype-grade | 10 - prompt to playable, zero code | 5 - lives in the platform runtime | 6.5 |
| 9 | Roblox | Platform world | 2 - tenant on Roblox, no export | 4 - proprietary client, not the open web | 10 - 132M daily users, real economy | 9 - free, Roblox Studio | 9 - persistent worlds and economy | 6.2 |
| 10 | Tencent HunyuanWorld | Open world model | 7 - open weights, but EU/UK/Korea excluded | 4 - engine output, not web | 7 - open, capable, self-hostable | 4 - needs 40GB+ VRAM | 8 - explicit mesh and splat output | 6.0 |
| 11 | Fortnite + UEFN | Platform world | 2 - tenant, creator revenue share only | 2 - native client, not the web | 10 - ~110M monthly users | 7 - UEFN is harder than Roblox Studio | 9 - persistent islands and economy | 5.4 |
| 12 | Decart Oasis 3 | Streamed sim | 1 - nothing to keep, frames only | 6 - API stream you can embed | 6 - launched June 2026 | 5 - $0.02/second API | 2 - streamed, drifts fast | 4.0 |
| 13 | Google Genie 3 | Streamed world | 1 - no export, nothing to own | 5 - in-browser but gated, no embed | 6 - dazzling, but research preview | 3 - $200/mo AI Ultra, US only | 2 - roughly one minute of memory | 3.5 |
The criteria, and why each carries the weight it does. Ownership (25%) and web-native delivery (25%) are the two heaviest because they are the two properties that literally defined the web: a thing you control and can publish, reachable by a plain link with no install. Maturity (20%) separates the shipping from the promised, which matters enormously in a field this hyped. Accessibility (15%) captures how many people can actually start, the property that turned the web from a physics-lab tool into a planetary medium. Persistence and consistency (15%) measures whether the world remembers itself, because a space that forgets your cart, your edits, and where the door was ninety seconds ago cannot host commerce, identity, or return visits.
What the ranking actually says. The top of the board is the unglamorous open web: Three.js, PlayCanvas, Babylon.js, and the declarative A-Frame, joined by Marble as the one generative model that hands you a file. The bottom is occupied by the most technically astonishing systems in the entire field, Genie 3 and Oasis 3, precisely because they score near zero on ownership and persistence. That is not a knock on their brilliance. It is the central finding of this guide expressed as a number: in 2026 the part of "virtual worlds" that can genuinely behave like websites is the ownable, web-served, persistent part, and the part that cannot is the streamed, gated, forgetful part, no matter how good it looks. Hold that result in mind, because the rest of the guide explains why it is structural and not a temporary state of the art.
2. The question, restated from first principles
To ask whether virtual worlds are the new websites, you first have to say what a website fundamentally is, with the marketing stripped off. A website is not "a page." It is addressable state served over an open protocol. Someone, anyone, can publish a thing, give it a stable address (a URL), and any other person can reach that exact thing from any device by following a link, with no install, no gatekeeper, and near-zero marginal cost to visit. The fact that it usually renders as 2D text is incidental. The web's power was never the flatness. It was the open, linkable, permissionless addressability. That is the property a successor has to inherit, and it is the property the 2021 metaverse never even tried to copy.
Tim Berners-Lee made the first website live on August 6, 1991, at CERN, a single page describing the project itself - CERN. By 2026 that one node had grown to roughly 1.43 billion hostnames counted by Netcraft, though only about 15% are genuinely active, which still leaves on the order of 200 million live sites run by independent creators - Netcraft. That growth curve is the real yardstick. The bar for "the new websites" is not "can you build one impressive world." It is "can hundreds of millions of independent people each run a persistent, visited place." The web cleared that bar by being cheap, open, and linkable. Any challenger is measured against the same clearance height, and most of the contenders below do not clear it.
Now define a virtual world with the same discipline, because the loose definition is where every bad prediction comes from. A virtual world is a place with state you can move through and act on, observed as imagery and changed by your input. The headset is not in that definition. The avatar is not in that definition. Photoreal graphics are not in that definition. Those are implementation choices that the metaverse era mistook for the essence. Once you separate the medium (an explorable, persistent, social 3D space) from the form factor (a face-computer you have to buy), the 2026 evidence snaps into focus: the medium is thriving at massive scale, and the specific form factor that 2021 bet everything on is the part that collapsed.
The image above is the question made concrete. A website is a page you write. That living room is a navigable 3D place that World Labs' Marble generated from a single input image, and you can walk through it in a browser - World Labs. The gap between "write a page" and "generate a place" is the gap this entire guide is mapping. The point to hold onto is that the generation of the place is now the easy part. Whether that place behaves like a website (ownable, linkable, persistent) is the hard part, and it is decided not by the graphics but by the four structural pieces every world is made of. So look at those next.
3. The anatomy of a virtual world
Reasoning about world technology by listing products is how people get confused, because a Roblox game, a Gaussian splat of your kitchen, and a Genie 3 stream look like the same category and are not. The way to keep them straight is to reduce a world to the parts that cannot be removed. Strip any virtual world to bedrock and exactly four pieces remain, and every tool in this guide is really a choice about which of those four pieces it provides and which it leaves to you. Get this framework and the entire market becomes legible.
The four irreducible parts are state, the update rule, the observe rule, and the input channel. State is just the numbers describing everything right now: positions, properties, who owns what. The update rule is the function that takes the current state plus your input and produces the next state: physics, game logic, the consequences of your actions. The observe rule is the function from state to what you see: rendering. The input channel is how your actions get back into the state: agency. A world is state, a rule to evolve it, a rule to observe it, and a way to act on it. Where the contenders differ is which of these they hand you as something durable and which they only simulate transiently.
This diagram explains why rendering is the wrong thing to obsess over, even though it gets all the attention. Rendering is only the observe rule, one of four parts, and it answers a genuinely simple question: for each pixel, what color is it. Photoreal rendering is, for practical purposes, solved. A modern Gaussian splat or a path-traced WebGPU scene looks indistinguishable from a photo. If graphics were the bottleneck, the metaverse would have won. It was never the bottleneck. The bottleneck lives in the other three parts: keeping state persistent, making the update rule meaningful (real consequences, real stakes), and making the inhabitants real rather than scripted. A world feels real in proportion to how persistent its state is and how real the others in it are, not how shiny its pixels are.
There is one more first-principles constraint that governs every large world, and it is worth internalizing because it explains why streamed worlds forget. You cannot fully simulate something bigger than the machine simulating it. A full-fidelity copy of a world needs at least as much information as the world. So every large virtual world must do one of two things: be smaller and simpler than reality, or compute lazily, only the slice you are currently observing, faking the rest until you look. This is not an optimization, it is a law. It is why a game world can feel infinite while running in a few gigabytes, and it is exactly the trick streamed world models use. The catch is that if you only ever compute the observed slice and never store it, the world has no memory. That single trade, store-the-slice versus discard-the-slice, is the fault line that splits the entire generative-worlds market in two, and it maps directly onto the ownership question that decides whether any of this can be "the new websites."
4. Why the $83 billion metaverse failed
Any honest 2026 guide has to start the "worlds versus websites" debate at the scene of the largest corporate misadventure in the history of consumer technology, because the lesson is structural, not a matter of taste. On October 28, 2021, Facebook renamed itself Meta and bet the company on a headset-based metaverse, predicting it would reach a billion people within a decade - PBS NewsHour. That bet has so far produced one of the great money fires of all time. Meta's Reality Labs division lost $19.2 billion in 2025 alone, on just $2.2 billion of revenue - Meta investor relations. By the first quarter of 2026 the cumulative operating loss had reached roughly $83.6 billion since the segment was broken out in late 2020 - CNBC.
It is tempting to read that as "people just do not like 3D," but that is exactly the wrong lesson, and getting it wrong is how you make the same mistake again. The failure was not a failure of graphics or even of vision. It was a stack of concrete structural frictions, and naming them precisely tells you what a successor must avoid. The first and largest is the hardware tax. The web's defining genius was that visiting a new site costs nothing: you follow a URL. The metaverse reintroduced a physical purchase, a headset costing anywhere from a few hundred dollars to $3,499, that is heavy, isolating, and uncomfortable to wear for long, as the price of entry. That single requirement collapsed the addressable audience from "everyone with a screen" to "people willing to strap a computer to their face," which turned out to be a small and shrinking group.
The second friction was a content desert and a chicken-and-egg trap. Meta's flagship social world, Horizon Worlds, never crossed even a downgraded internal target of 500,000 monthly users, sitting under 200,000 at its 2022 measurement, with most people never returning after the first month - CNBC. You cannot bootstrap a place worth revisiting without creators, and creators will not build for an empty place. The third friction was that capital alone could not buy the flywheel: $83.6 billion bought headsets and worlds but not the network effect, because network effects are not for sale. The clearest proof is Apple. The Vision Pro, the best-engineered headset ever shipped, sold an estimated 600,000 units total, its M5 refresh in late 2025 failed to revive demand, and by 2026 reports indicate Apple has effectively shelved next-generation development - MacRumors. When the two richest companies on earth cannot brute-force a category, the obstacle is structural.
The market itself is now voting with its shipments, and the direction is unambiguous. Total XR device shipments grew 44.4% in 2025, which sounds like a metaverse boom until you look inside the number: Meta Quest VR headset shipments fell 42.3%, while the entire growth came from lightweight smart glasses, many without any display at all - IDC. People want a discreet heads-up layer over their real life, not a replacement world to live inside. Meta read the same data and quietly pivoted, cutting Reality Labs staff and redirecting budget toward AI glasses and AI infrastructure, a tacit admission documented in its own restructuring - GamesBeat. We traced this pivot in our coverage of Meta's broader AI turn in the Meta Muse Spark guide, where the same company that bet $83 billion on headsets is now racing to ship frontier AI at half the compute cost. The practical takeaway is blunt: any successor to the website that requires a hardware purchase to participate is starting from a structural disadvantage the web never had. Which is precisely why the worlds that actually won did the opposite.
5. The worlds that actually won
Here is the twist that the metaverse obituaries miss, and it is the single most important fact in this entire debate: the virtual world did not fail, it just shed the headset and the "Meta" branding and went to the screen you already own. While Reality Labs was setting fire to tens of billions of dollars, explorable, social, user-built 3D worlds were reaching hundreds of millions of people on ordinary phones and PCs. The medium is a runaway success. The form factor that 2021 bet on is the only part that failed. Once you separate those two, the "are virtual worlds the new websites" question stops being speculative and becomes a question about platforms that already exist at planetary scale.
The existence proof is Roblox. In the first quarter of 2026 it reached 132 million average daily active users, up 35% year over year, who collectively spent 31 billion hours in its worlds - Roblox shareholder letter. Those are not passive viewers, they are people inhabiting and building persistent 3D spaces, and crucially it is a real economy: Roblox booked $1.7 billion in the quarter, up 43%, almost entirely from worlds that its users created - Roblox. Compare that to Horizon Worlds' sub-200,000 users. The difference is not graphics, both are stylized and cartoonish. The difference is that Roblox requires no headset and no purchase to enter, and it gave millions of ordinary creators a free tool to build worlds other people actually visit. That is the website pattern (cheap creation, free visiting, millions of independent publishers) reproduced in 3D.
Fortnite is converting from a game into something even closer to a creator platform, and the resemblance to the web's structure is striking. It runs around 110 million monthly users, and roughly 40% of all playtime now happens in third-party creator content, not Epic's own mode - SCCG Management. Epic built the supporting machinery to match: the Unreal Editor for Fortnite plus Creator Economy 2.0 pays island creators 40% of the net real-money revenue their worlds generate - Epic Games. That is a creation tool, a hosting platform, a discovery layer, and a monetization split: the four pillars that made the web a place where independent publishers could earn a living, rebuilt for 3D worlds.
The concreteness of that lock-in is easy to underrate until you price it out. A creator who builds a hit experience on Roblox or Fortnite cannot take their audience, their world file, or their economy anywhere else, and the platform sets the revenue split unilaterally, which is why Epic's headline-grabbing 40% share is generous precisely because Epic can change it tomorrow. Tim Sweeney, Epic's chief executive, has spent years publicly championing an open, interoperable metaverse where worlds and items move freely between platforms, and the fact that this remains unrealized in 2026 is the tell: the incentives of every successful world platform push toward enclosure, not interoperability. The 2D web avoided that fate only because its founding protocol was open before any single company could capture it. No equivalent open world protocol has reached scale, so today's worlds are giant private kingdoms that happen to look like the future.
So is the question settled, are worlds the new websites? Not so fast, and the reason is the ownership clause that the scorecard scored so harshly. Roblox and Fortnite prove the medium works at scale, but they are walled gardens, not an open protocol. A world you build on Roblox is a tenant on Roblox's land. You cannot export it, you cannot link to a raw scene, you cannot move it to another host, and Roblox can change the rules or take its cut at will. The 2D web's deepest property was that your site was yours on an open standard, reachable by a universal link, hostable anywhere. Roblox and Fortnite are closer to the AOL and app-store model than to HTTP: enormously successful, but enclosed. Until there is an open, interoperable world protocol, the screen-based metaverse will look less like the open web and more like a handful of giant private kingdoms. That gap, between a thriving-but-enclosed medium and an open one, is exactly what the technical shifts of 2026 started to close.
6. What changed under the hood
If the medium already works on screens, the obvious next question is why 2026 specifically is the moment people started taking "ownable, open, web-native 3D worlds" seriously rather than treating it as a perpetual someday. The answer is not a single breakthrough. It is a stack of concrete technical primitives that all matured in parallel, turning the browser from a place that could barely show 3D into a genuine 3D runtime. The 2021 metaverse failed partly because the hard problems were assumed to be social and economic. The 2026 reality is that the hard problems were technical primitives, and those got solved on an open, linkable, install-free substrate.
The first primitive is a real GPU in the browser. WebGPU, the modern graphics and compute interface for the web, went from experimental to roughly 82% global browser support by mid-2026, and the decisive moment was Safari finally shipping it across the entire Apple ecosystem in Safari 26 (macOS, iOS, iPadOS, and visionOS) - web.dev. iOS Safari was the holdout that had gated "works for everyone," and once it shipped, compute shaders (GPU-accelerated sorting, splat rendering, even machine-learning inference) became something a website could run in a tab. The engineering reality of 2026 is that every serious engine ships a WebGPU fast path with a WebGL2 fallback, because Firefox is still catching up and you cannot yet assume WebGPU everywhere. That dual-path discipline is the quiet signature of production web 3D this year.
The second and deeper shift is Gaussian splatting, the technique that replaced hand-built polygon meshes as the way to get photoreal reality onto the web. Instead of an artist modeling a scene triangle by triangle, you capture a real place with a phone and an algorithm fits millions of soft, colored, oriented 3D blobs to your photos, which then render at 60 frames per second in a browser. The visual below, a garden rendered from the original 2023 research, shows the fidelity: this is not a model someone built, it is reality captured and replayed - Inria GraphDeco. The reason splatting matters for the "new websites" thesis is subtle: a splat of a real place is inherently consistent and persistent, because it captured something that actually exists and stays put, which is exactly the property the flashy generative streams lack.
The third primitive is standardization and compression, the unglamorous work that turns a research demo into infrastructure. Niantic open-sourced a format called SPZ, effectively the "JPEG for 3D splats," under an MIT license, roughly ten times smaller than the older PLY files with no visible quality loss - Niantic. More importantly, Khronos (the standards body behind glTF, the "JPEG of 3D") put Gaussian splatting directly into the glTF format via the KHR_gaussian_splatting extension. The precise 2026 status matters and is widely overstated: it is a Release Candidate as of February 2026, not yet ratified, with formal ratification targeted for the second quarter - Khronos. When it ratifies, splats inherit the entire glTF ecosystem (every engine, every viewer, every AR pipeline) and become a first-class, web-loadable, ownable file. We covered the upstream version of this capture-to-3D pipeline in our AI Image to 3D guide, which maps how single images become editable 3D objects.
The clearest sign that this stopped being a research toy is that the largest companies on earth are shipping it into mainstream products. At WWDC 2026 on June 9, Apple announced an AI-enhanced Maps Flyover built on radiance-field representation across more than 300 cities, shipping in iOS 27 in late 2026, in what commentators called possibly the largest deployment of the technique to date, though Apple pointedly did not say "Gaussian splatting" by name - Radiance Fields. The strategic tell is even louder on the open-web side: World Labs, a frontier AI lab, open-sourced its splat renderer Spark under MIT and built it on plain Three.js and WebGL2 with roughly 98% device reach, rather than locking generated worlds inside a proprietary viewer - Spark docs. When a frontier lab treats the open web as the default delivery surface for AI-generated 3D, "the web is the substrate" has gone from contrarian bet to consensus. The question then becomes what those labs are actually generating, which splits cleanly into the most important distinction in the field.
7. Generative world models: ownable versus streamed
The phrase "AI generates a 3D world" hides the single most important split in this entire field, and confusing the two halves is the number-one error in the 2026 discourse. There are two genuinely different things that both get called world generation, and they sit on opposite sides of the store-the-slice-versus-discard-the-slice fault line from the anatomy section. One produces a persistent artifact you can download and own. The other produces a stream of frames that exists only while you interact and vanishes when you stop. They imply different products, different business models, different moats, and crucially, completely different answers to "is this the new website." Get this distinction and the dozen models in the market organize themselves.
Start with the ownable side, because that is the side that can behave like a website. The archetype is World Labs' Marble, founded by Fei-Fei Li, which became generally available on November 12, 2025 and was the first commercial product of its kind - TechCrunch. You give it text, an image, a video, or a coarse 3D layout, and it returns a stable, explorable scene you can export as Gaussian splats (SPZ or PLY) plus a collider mesh in GLB for physics. That dual export is the clever part: splats have no built-in collision, so the mesh is what makes the world walkable in a game engine. Because you are buying a durable artifact, the economics are subscriptions and credits, not compute-seconds. The investor enthusiasm is real and verifiable: World Labs closed a $1 billion round on February 18, 2026 (including $200 million from Autodesk) at a reported $5 billion valuation - TechCrunch.
The launch film above shows the whole pitch in motion: a sentence or a photo becomes a place you move through, and then you download it. That download is the entire reason Marble scored an 8.4 on the scorecard while the more dazzling streamed models scored in the threes. Note the ownership trap, though, because it is exactly the kind of detail that turns "free" into a liability. On Marble's free tier you do not own your output (it carries a non-commercial license), and ownership only attaches on the paid tiers - World Labs. Marble's consumer pricing runs Free for 4 worlds, then Standard at $20, Pro at $35, and Max at $95 per month, with commercial rights starting at Pro - TechCrunch. For programmatic use, the World API prices at $1.00 per 1,250 credits, and a standard world generation costs 1,500 credits - World Labs API docs.
It helps to walk one concrete build through the ownable path, because the abstract "you get a file" only lands when you see what the file does. Imagine you want a furnished apartment world for a real-estate site. You send Marble a single listing photo, it returns a navigable scene, and you export the SPZ splat for the photoreal look plus the GLB collider mesh for walkability. You drop the splat into a Three.js page so visitors explore it in a browser with no app, and you use the collider mesh to stop them walking through walls and to place clickable hotspots on the appliances. Nothing in that chain phones home to World Labs after generation: the splat and the mesh are files on your own server, versioned in your own repository, served from your own domain. That is the difference between a world and a stream stated as an engineering fact, and it is why the ownable side is the only side that composes cleanly with the rest of the web.
Now the streamed side, which is where the most jaw-dropping demos live and also where the "new website" thesis breaks. The archetype is Google DeepMind's Genie 3, introduced on August 5, 2025 and released to the public as Project Genie for Google AI Ultra subscribers on January 29, 2026 - Google DeepMind. It generates a navigable world from a prompt in real time at 720p and roughly 24 frames per second, and it is genuinely magical to explore. But it is autoregressive: it predicts each next frame from the previous frames plus your input, storing nothing. The consequence is the defining limitation of the streamed category, straight from DeepMind's own materials: visual memory reaches back only about one minute, consistency holds for several minutes, and then the world drifts and forgets - Google DeepMind. We first tracked this lineage in our coverage of Genie 2 generating interactive worlds, and Genie 3 is a leap in fidelity but sits on the same one-minute wall.
The demo below makes the appeal obvious, and it is worth watching to feel why people get carried away. You type a place and you are instantly inside it, walking around. The reason it cannot be the new website is not aesthetic, it is structural and it traces straight back to the information wall from section 3: a world that only computes the observed slice and never stores it has no persistent state. It literally cannot remember your shopping cart, your form inputs, or where the checkout button was ninety seconds ago. The web's killer feature, addressable state you can link to, bookmark, and return to, is precisely what a streamed model forgets. That is the engineering core of the "screensaver" critique: an infinite, effortless world with no persistence has no stakes, no return path, and nothing to link to.
The rest of the field sorts neatly onto these two sides, and the funding tells you the stakes. On the streamed side, Decart's Oasis 3 streams photoreal driving simulation via API at $0.02 per second, raising $300 million at a roughly $4 billion valuation alongside its June 2026 launch - TechCrunch. Runway's GWM-1 forks explicitly into Worlds, Robotics, and Avatars variants - Runway. On the ownable side, NVIDIA Cosmos 3 launched June 1, 2026 as an open physical-AI foundation model under a permissive license - NVIDIA, and Tencent's HunyuanWorld ships open weights but with a license that pointedly excludes the EU, UK, and South Korea, a hard blocker for any European operator - Tencent. Capital is pouring into both sides, well past a billion dollars across the recent rounds.
The synthesis to carry forward is sharp: explicit models win for anything that needs persistence (asset pipelines, robotics training, an ownable web world), while streamed models win for ephemeral exploration (ideation, simulation, training environments). For the "new websites" question specifically, only the explicit, exportable side qualifies, because only it produces something with the durability and addressability a website requires. And the most durable, most ownable way to produce a world is also the oldest and least hyped: write the code yourself, or have an agent write it for you.
8. The web 3D engines AI writes against
Once you accept that the ownable path is the one that behaves like a website, the most ownable path of all comes into focus, and it is not a generative model at all. It is code. A virtual world expressed as plain JavaScript and standard 3D files is a thing you own end to end, hostable anywhere, with no vendor able to reprice it, deprecate it, or revoke your license. This is why the top of the scorecard is dominated by code engines rather than AI world models. And the reason this matters more in 2026 than ever before is that you increasingly do not have to write that code by hand: frontier models have become genuinely good at emitting it, which turns "code your own world" from a specialist skill into a prompt.
The anchor of this entire category is Three.js, the open-source library that has been the default way to do 3D on the web for over a decade. It reached revision r184 in April 2026 and roughly 113,000 GitHub stars, and it is pure MIT-licensed JavaScript that tree-shakes down to files you fully own - Three.js. Its importance for the AI era is specific and underappreciated: because Three.js has the largest training corpus and gives instant in-browser feedback, frontier coding models are unusually reliable at generating it. For React and Next.js codebases, the dominant layer on top is React Three Fiber, which wraps Three.js in declarative JSX, has about 31,000 stars and 700,000 weekly npm downloads, a signal that 3D has gone mainstream in ordinary product UIs, not just art demos - React Three Fiber. That declarative JSX maps cleanly onto how language models write code, which is exactly why it tops the scorecard.
The rest of the category trades reach for convenience in instructive ways, and the right pick depends on what you value. The practical contenders break down along a small set of distinctions worth stating plainly before listing them, because the differences are about corpus size, ownership, and how much the engine does for you versus how much you control.
- PlayCanvas + SuperSplat - MIT engine with a compute-based WebGPU splat renderer and a free, browser-based splat editor; the strongest owned path for splat-heavy worlds - PlayCanvas
- Babylon.js - batteries-included Apache-2.0 engine from Microsoft, now at v9 with built-in physics and Gaussian splat support, more structured but a thinner AI corpus - Babylon.js
- A-Frame + model-viewer - HTML-declarative layers on top of Three.js; the lowest barrier to entry and the simplest reliable path to an owned glTF plus web AR
- Rosebud AI - browser-based vibe coding that turns prompts into playable 3D, lowest friction but with the hosted-platform lock-in that drops its ownership score - Rosebud
What ties this category to the AI-codegen story is a small but decisive workflow detail: the path from a generated asset to editable web code is now nearly frictionless. A tool called gltfjsx takes any glTF model and emits a clean React Three Fiber component, so an agent can generate a mesh, convert it to JSX, and wire it into a scene without a human touching the geometry. Combined with Three.js being one of the most reliably generated libraries (its enormous training corpus means models rarely hallucinate its API), this is why "describe a scene, get owned code" works in practice rather than just in principle. The trade-off against the convenience engines is real and worth naming: Babylon's built-in physics and inspector save you setup time but give an AI a thinner corpus to draw on, while A-Frame's HTML simplicity caps how stateful your world can become. You are choosing, in effect, between maximum AI reliability (Three.js) and maximum batteries-included structure (Babylon), and for an ownable web world the reliability usually wins.
The reason this list does not include Unity or Unreal is itself a first-principles point about the web. Those engines generate superb native games, and AI writes Unity C# very well, but their web export is heavy (multi-megabyte downloads and long load times) compared to Three.js shipping in well under a megabyte, and the output is proprietary rather than plain owned files. For a world that is supposed to behave like a website (instant to load, linkable, ownable), the open web stack wins decisively, which is why it occupies four of the scorecard's top five rows. The defining fact of this whole category, though, is a gap, not a feature: there is no official Three.js, pmndrs, Vercel, or Anthropic 3D agent skill. The big-platform agent tooling for web 3D is conspicuously thin, which is both a surprise and an opportunity, and it connects directly to the broader story of how AI changed who can build any of this. Before that, the worlds still need things to put inside them.
9. The asset and capture supply chain
A world is empty until you fill it, and the supply of 3D content (the props, the characters, the captured real places) is where the ownership question gets sharpest and most practical. This is the layer most builders touch first, because even a hand-coded Three.js world needs a chair, a tree, a building, or a scan of a real room. In 2026 that supply splits into two halves: generation (AI makes a 3D object from text or an image) and capture (you scan something real). Both have matured enormously, and both hide a licensing structure that determines whether the file you get is actually yours. The single most important rule in this section: in 3D generation, the dividing line is not model quality, it is the license attached to the output.
The generation market is led by two services whose free tiers contain the same trap. Meshy, now on its sixth model and generally available since January 2026, and Tripo, which has moved well past its v3 line, both produce excellent meshes from a prompt or image - Meshy. But on the free tiers of both, your output is published under CC BY 4.0, meaning it is a public asset anyone can reuse and you must credit the tool when you use it commercially - Tripo. You only buy private ownership when you pay. This is the modern equivalent of stock-photo watermarking: free to generate, but the file belongs to the commons. The cleaner posture comes from Rodin (now at Gen-2.5, made by Deemos with ByteDance as an investor and customer), which lets you preview free and pay only on download, with full commercial rights on paid output - Hyper3D.
| Tool | Free tier | Entry paid tier | Free-tier ownership |
|---|---|---|---|
| Meshy (Meshy-6) | $0, 100 credits/mo | $20/mo Pro | CC BY 4.0 (public, attribution) |
| Tripo | $0, 200 credits/mo | $19.90/mo Pro | CC BY 4.0 (public, attribution) |
| Rodin / Hyper3D | $0, pay per credit | $30/mo Creator | Pay-per-download, full rights |
| Stability SF3D / SPAR3D | Open weights, self-host | Enterprise above $1M revenue | Free commercial under $1M revenue |
The genuinely different model, and the one that matters most for anyone thinking in terms of ownership, is open weights. Stability AI's SF3D and SPAR3D ship as downloadable models under a community license that is free for commercial use by any individual or company under $1 million in annual revenue - Stability AI. That is categorically different from renting an API: an open-weights model is a file you own and can embed in your pipeline forever, immune to a vendor changing prices, deprecating access, or getting acquired. The cautionary tale arrived on schedule: CSM, maker of the popular Cube image-to-3D API, was acquired by Google and folded into DeepMind, closing in January 2026 - 3D Printing Industry. Anyone who built on the CSM API now has a vendor-risk problem. Anyone who had downloaded an open-weights equivalent does not. We go deeper on this whole generation layer in our AI Image to 3D guide.
The capture side is where "you own a file" is already a settled victory, and it is the most web-native part of the entire stack. Scaniverse, free from Niantic, turns your phone into a Gaussian-splat scanner that processes on-device in 60 to 90 seconds with no upload and exports open formats including SPZ, PLY, GLB, and USDZ - Scaniverse. SuperSplat, from PlayCanvas, is a free, MIT-licensed, browser-based splat editor, so the entire capture-edit-publish loop now has a zero-cost, open, no-account path. This is the web's historical pattern repeating exactly: just as JPG, PNG, and SVG made images into ownable files rather than service-gated content, SPZ, glTF, and USDZ are doing the same for 3D. The strategic lesson is clean: the generation layer is fragmenting into rent-versus-own, but the output layer is consolidating onto open, ownable, web-deliverable file formats. Treat your 3D as files you host yourself, and you are immune to the vendor churn that just swallowed CSM. The question of who can actually assemble all this into a world is the last piece, and it is where AI agents changed everything.
10. How AI agents changed who can build a world
Every layer described so far (the engines, the generators, the capture tools) used to require a credentialed specialist to operate. The single biggest shift of 2026 is that AI agents collapsed the gap between intent and 3D artifact, turning world-building from a years-of-training skill into something closer to a conversation. This is the part of the story that actually makes "worlds as the new websites" plausible at scale, because the web only became planetary when ordinary people could publish without learning to code. The same democratization is now happening to 3D, and it is splitting into two distinct layers with opposite consequences for ownership.
The first layer is agents driving professional tools through the Model Context Protocol, which lets an AI operate software the way a human would. The pattern began as a viral community project, Blender MCP, which sockets an AI into Blender's full Python API and has gathered around 22,700 GitHub stars - Blender MCP. It got formalized when Anthropic shipped "Claude for Creative Work" on April 28, 2026, a set of nine official connectors including an official Blender connector built by the Blender Foundation's own lab - Anthropic. Because these connectors are MCP-based and open, the Blender integration is reachable by any model, not just one vendor's, which is the anti-lock-in counterweight. The same shape repeats in Unity, which now ships an official first-party MCP integration alongside a community server with over ten thousand stars - Unity. Anything a Python or C# developer could script, an agent can now do in plain language.
The second layer is agents generating the 3D code itself, writing Three.js, WebGPU shaders, and React Three Fiber from a description, and this is where it matters most for the web. The ecosystem here is overwhelmingly Claude-skewed: community skill packs like CloudAI-X/threejs-skills (around 2,400 stars) and dgreenheck/webgpu-claude-skill exist specifically to scaffold Claude Code's 3D output - CloudAI-X. The strongest qualitative data point comes from the creative studio behind ShaderGPT, which turns English into compiling GLSL shaders and reports Claude as the most consistent model across Claude, GPT-5.5, and Gemini for the task - 14islands. That tracks with the broader leaderboard reality: as of June 2026, Claude Fable 5 leads LMArena's WebDev arena by nearly 100 points over the next model - Crypto Briefing, the exact arena that measures front-end and 3D code generation. We break down that model's coding strengths in our Claude Fable 5 guide.
This is where the ownership story reaches its sharpest fork, and it is the tension a builder must navigate deliberately. The two agent paths produce opposite ownership outcomes from the same underlying capability. The MCP-and-skills path generates portable artifacts you own (a .blend file, a Three.js repository, raw GLSL), with the agent as a replaceable driver, which is high-ownership and low-lock-in. The vibe-coding-platform path (Rosebud, Websim, and peers) offers radically lower friction ("the browser is the new game engine," prompt to playable in minutes) but the world tends to live inside the platform's runtime and account, which is low-friction but higher-lock-in. The same capability, an AI writing the 3D, yields a world you own or a world you rent depending entirely on whether the output is an open file or a hosted experience. This is precisely the abstraction shift that Yuma Heymans (@yumahey) has built his last two companies around: from HeroHunt.ai's autopilot recruiter to O-mega's autonomous agents, the throughline is collapsing the gap between a prompt and a shipped, owned artifact, and worlds replacing websites is that same collapse reaching the web's front door.
This is also where an AI workforce fits as one option among the alternatives. Rather than operating a single tool, a platform like O-mega points a team of agents at the whole job: describe the world, and the agents write the Three.js, call the Marble API for the scene, pull captured splats from Scaniverse, wire up the storage and the auth, and hand you the repository. The differentiator versus the hosted vibe-coding tools is the same one that runs through this entire guide: the output is portable code and files you own, not an experience locked to a platform. For non-technical builders weighing the trade, our guides on building software with AI and how to build an app with AI cover the same prompt-to-owned-artifact pattern applied to ordinary apps. The capability is now real and widely available. What it cannot do is change the structural verdict, which is where everything converges.
11. The verdict: complement, not replacement
After all the platforms, prices, and capabilities, the honest answer to "are virtual worlds the new websites" has to be built from the structure, not from the hype, and the structure points to a precise and slightly deflating conclusion. Virtual worlds are a genuine, large-scale new medium, but in 2026 they are a complement and a new layer, not a replacement for the document web. The headset-centric metaverse specifically was an $80-billion category error in form factor. The screen-based world is a runaway success but mostly enclosed. And the generative "infinite world" is blocked at a memory wall. Each of those is a structural fact, not a temporary limitation, and together they define what worlds can and cannot inherit from websites.
The strongest argument against wholesale replacement is the one from web history itself: the substrate persists under every new layer. The web went from static read-only pages, to interactive social platforms, to mobile-first, and at no step did the new layer delete the old substrate. The document, the URL, and the open protocol survived every transition and got a new interaction layer on top. Virtual worlds are almost certainly another such layer (for play, social presence, simulation, commerce experiences) riding on the same web substrate, not a deletion of it. This is reinforced by the brute fact that most information tasks are actively worse in 3D: reading a document, checking a fact, buying a thing, and filling a form are all faster and more precise as text and buttons. The web's "friction" of reading and clicking is also its precision and its near-zero cost.
The counter-narrative is grounded in hard data, not taste, and three numbers carry it. First, the consistency wall: the best streamed world model on earth, Genie 3, holds memory for only about a minute by DeepMind's own admission, which means a "website you walk through" literally cannot remember your state - Google DeepMind. Second, accessibility: the 2026 WebAIM Million report found that 95.9% of the top million pages have detectable accessibility failures, with low-contrast text the number-one issue on 83.9% of pages, evidence that the web is still fundamentally a text-and-semantic-structure medium that screen readers and crawlers depend on - WebAIM. A radiance field is invisible to a screen reader and to a search crawler. Third, scale: roughly 1.43 billion websites serve about 6 billion users on every device down to a 2G phone - Siteefy, a surface no 3D pipeline comes close to reaching. The image below, three frames of Genie 3 holding a temple together as you move, is impressive precisely because consistency is the hard part, and it only holds for a minute.
There is one pro-3D signal that deserves equal honesty, because dismissing it would be the mirror-image error of the hype. Gaussian splatting as a capture-and-view layer is crossing from research into infrastructure, and it succeeds precisely because it solves the consistency problem from the other direction. A splat captures a real, persistent place, so it is inherently consistent in the way the generative streams are not. That is why Apple is putting it into Maps across 300+ cities, why Zillow and real-estate platforms are shipping it, and why Khronos is standardizing it into glTF. There is also a forward-looking reason the verdict is "complement now" rather than "complement forever," and it is the part of the frontier that AI agents directly address: inhabitation. A world feels real in proportion to whether the other things in it are real minds rather than scripts. The screen-based platforms are full of people, which is why they thrive, but generated and captured worlds are mostly empty. The piece that could change the equation is the same one transforming every other layer: agents. A persistent, ownable web world populated not by canned dialogue trees but by autonomous agents that remember, act, and have goals would clear the inhabitation bar the metaverse never reached. That is speculative today, but it is the credible path by which a complement could grow into something more, and it is why the ownable, persistent branch (the one that can actually hold state for agents to act on) is the branch worth building. We explore the agent side of this in our guide to building AI agents.
The nuance to hold is that this is 3D as a photoreal capture layer embedded inside otherwise-2D apps and maps, not 3D as a replacement for documents or as live-generated infinite worlds. The first principles reminder that governs all of it is the one from section 3: rendering is solved, and the real frontier is consistency, inhabitation, and meaning, whether a world stays consistent without holding all of it, whether the others in it are real, and whether anything has stakes. A world that nails those, on an open, ownable substrate, is the only version that can genuinely be the new website. Which is exactly what you can build today.
12. The 2026 playbook for an ownable web world
The deflating verdict of the last section is also liberating, because it tells you precisely what to build and what to ignore. If worlds are a new layer on the open web rather than a headset replacement for it, then the winning move is obvious from first principles: build explorable 3D spaces you own as files and serve on the open web, populated by real logic, with real persistence and stakes, and treat the streamed generative models as ideation toys you rebuild from rather than runtimes you depend on. This section turns that into a concrete, primary-source-verified pipeline you can start this week, every step of which produces a portable artifact you keep.
The spine of the ownable pipeline is code plus assets plus the explicit slice of generation, served on the web, and the sequence matters because each step hands the next a file you own. The pattern is to generate or capture the world as a portable artifact, store it in open formats, and render it with an MIT-licensed web engine, so that no single vendor sits in your critical path. Before listing the steps, the governing principle is worth stating: at every stage, prefer the option that gives you a downloadable file under a clear commercial license, and pay for ownership rather than accepting a free tier that publishes your work to the commons.
- Generate the world cleanly with World Labs Marble on a paid tier (free output is not yours), exporting SPZ and PLY splats plus a GLB collider mesh - World Labs
- Add discrete assets from paid Meshy or Tripo tiers, or self-host open-weights Stability SF3D under the $1M-revenue free commercial license - Stability AI
- Capture real places for free with Scaniverse to SPZ, and clean them in the free MIT-licensed SuperSplat
- Store everything in open formats (SPZ and PLY for splats, glTF/GLB for meshes), treating the glTF splat extension as future-proofing since it is not yet ratified
- Serve on the open web with Three.js plus Spark (WebGL2 baseline, roughly 98% device reach) and a WebGPU fast path via PlayCanvas or Babylon for capable hardware
That sequence is the whole defensible position, and it is worth being explicit about why each choice beats the alternative, because the temptations to deviate are exactly the traps the rest of the guide documented. Choosing Marble's paid tier over its free tier is the difference between owning your world and licensing it back under non-commercial terms. Choosing open formats over a proprietary viewer is the difference between a world you can host on any CDN and one trapped in a vendor's player. Choosing Three.js over Unity web export is the difference between a sub-megabyte instant-load scene and a multi-megabyte download. And choosing the MCP-and-code agent path over a hosted vibe-coding platform is the difference between a repository you own and an experience you rent. Every one of these is the same decision in a different costume: own the file, or rent the experience.
The thing that does not yet exist, and the clearest opportunity in the whole field, is the connective tissue. There is no official Three.js, pmndrs, Vercel, or Anthropic 3D agent skill that pins the current engine version, knows the WebGPU and shader-language migration gotchas, and gives an AI a render-and-inspect feedback loop so it can see its own output and self-correct. Building that, a version-pinned web-3D agent skill plus a visual feedback loop that emits portable code, is a genuine build-it-yourself moat, and it slots directly into the agent-driven, ownership-first model this guide has argued for throughout. For founders thinking about where that kind of durable, owned capability fits in a company, our founder's guide to starting a company in 2026 and our walkthrough of building products with AI fast cover the adjacent ground.
The decision framework, reduced to one line: if you want a world that behaves like a website, build it explicit, own it as a file, serve it on the open web, and make it persistent. Use the streamed world models (Genie 3, Oasis 3, Runway GWM) for ideation and simulation, never as your runtime, because their one-minute memory wall is structural and disqualifies them from anything that needs to be remembered, linked, or returned to. Use the screen-based platforms (Roblox, Fortnite) when you want instant scale and are willing to be a tenant, and avoid them when ownership and an open protocol matter. The medium is real, it is large, and it is finally web-native. It is not replacing the document web. It is becoming the next layer on top of it, and in 2026, for the first time, that layer is something an ordinary builder, or an ordinary builder's agents, can generate, own, and serve without asking anyone's permission. That, and not the headset, is what "virtual worlds are the new websites" actually means.
This guide reflects the AI-generated 3D worlds and web-3D landscape as of June 2026. Model versions, pricing, funding rounds, and standards status (including the glTF Gaussian-splatting extension, still a Release Candidate) change quickly in this field. Verify current details before committing to any platform.