For Second Life at least, it's hard to package stuff. World geometry, player models, lighting and props can all be edited in realtime. You can't use most of the tricks that video game devs are used to: no simple way to prebake lighting or reflections, no occlusion culling or binary space partition. The asset base is enormous: hundreds or thousands of terabytes. You can bet that every player character will have a completely unique set of textures and models, very little is shared between characters. This is why long time SL users tend to stick to one location, because actually exploring is painful-- move 100 meters, then stand still for a few minutes as your framerate plummets and everything slowly loads in. Many clubs or hangout spaces are on islands or boxes in the sky high enough that nothing else loads in. This reduces the problem just to rendering avatars, which is hard enough by itself.
It's also very hard to depreciate old systems. The game is 18 years old, but there were real money transactions from day one. If a user bought something ten years ago, they expect to still be able to use it!
You can imagine a clean-sheet design that does away with all this. Make the world static, player avatars pre-baked. Most SL competitors do this. (Facebook Spaces, Playstation Home etc) This gives you better FPS and a much more consistent aesthetic, since all the assets are made in-house. But now what? The classic Metaverse problem, there's just not that much to do. Cutting features from SL makes this worse, not better.
Which is why I'm working on a multi-threaded Second Life / Open Simulator client in Rust. The frame rate problem can be overcome. Another example of mine:
Over-complicated avatar clothing is a separate problem. This is a huge deal in Second Life, because it's the greatest dress-up virtual world ever built. Users expect a lot. The tattoo layer has to show through mesh stockings, for example. I have some ideas on speeding that up but haven't implemented anything. What's needed is an optimization step that takes place when an avatar changes clothes. All the layers of meshes need to be crunched down to a simplified game-type combined mesh. In a game, that would be done during asset building. In a world where you can change clothes, mixing and matching items, it has to be done somewhere near run time. But not on every frame, just at clothing changes.
This is already done for textures; all the texture clothing layers of an avatar are baked down to one. Roblox does something like this in their experimental mesh avatar system.
There's a mindset that this is impossible, shared by the low end of metaverse developers. It's not shared by Roblox or Epic or IMVU, who are busy solving the problem. This is a moderately hard problem, but it's not impossible.
Linden Lab staff had convinced themselves that it was impossible to speed up the viewer. After some people in Linden Lab management saw what I'd done, somehow there suddenly was much more effort going into improving the viewer FPS in their C++ client.
I think that the open part of the metaverse needs to be code and legal before aestetic. Mostly because skin mesh animation is the hardest part of game dev. because of pipeline costs and code complexity.
The Roblox solution looks terrible; bloated, bug prone and poor performing. I manage 2000 non-instanced characters (each with a interchangeable weapon) on a 1050Ti.
I'm going with royalty free assets that I happen to have a custom agreement to because of an early adopter advantage which allows me to redistribute them not only in a game but also with my open-source engine.
The content being editable creates headaches and decreases improvement of technology since more time is wasted on supporting more and more data for zero gameplay benefit.
Physics is the gameplay changer for MMO games, and it needs to be implemented in the game and not in some external library to be efficient/specialized enough to scale in a MMO setting.
I'm actually really curious to see how streaming video might work for things like light maps. You could have a beast of a machine, or a cluster, performing real-time raytracing on light and environment maps that then get streamed to the user. Sort of a hybrid approach between on-device rendering and remote gaming systems like Stadia where all the rendering takes place remotely. I think Ben Nolan was working on something like this for CryptoVoxels, but I stopped following him when he went full NFT crazy.
How much to do remotely is a big issue. A big problem with local rendering is that you need more bandwidth to the client than you'd need for video. A big advantage of cloud gaming is that you're in a data center, close to the asset servers with many gigabits of bandwidth.
The big problem with remote rendering is that it costs too much. "Cloud gaming" startups have appeared and disappeared for years now. If they charge too little, they go broke, and if they charge too much, users leave. NVidia cloud gaming is currently $10/month for 6 hour sessions. So is Stadia, now. That's not too bad, but it may be a loss leader. NVidia already doubled their price once. Startups with similar offerings are charging around $45/month.
It's also very hard to depreciate old systems. The game is 18 years old, but there were real money transactions from day one. If a user bought something ten years ago, they expect to still be able to use it!
You can imagine a clean-sheet design that does away with all this. Make the world static, player avatars pre-baked. Most SL competitors do this. (Facebook Spaces, Playstation Home etc) This gives you better FPS and a much more consistent aesthetic, since all the assets are made in-house. But now what? The classic Metaverse problem, there's just not that much to do. Cutting features from SL makes this worse, not better.