Hacker News new | ask | show | jobs
by Animats 1653 days ago
Since I'm writing a new client, in Rust, for Second Life/Open Simulator, I'm very aware of these issues.

A metaverse client for a high-detail virtual world has most of the problems of an MMO client plus many of the problems of a web browser. First, much of what you're doing is time-sensitive. You have a stream of high-priority events in each direction that have to be dealt with quickly but don't have a high data volume. Then you have a lot of stuff that's less time critical.

The event stream is usually over UDP in the game world. Since you might lose a packet, that's a problem. Most games have "unreliable" packets, which, if lost, are superseded by later packets. ("Where is avatar now" is a typical use.) You'd like to have that stream on a higher quality of service than the others, if only ISPs and routers actually paid attention to that.

Then you have the less-critical stuff, which needs reliability. ("Object X enters world" is a typical use.) I'd use TCP for that, but Second Life has its own not very good UDP-based protocol, with a fixed retransmit timer. Reliable delivery, in-order delivery, no head of line blocking - pick two. TCP chooses the first two, SL's protocol chooses the first and third ones. Out of order delivery after a retransmit can cause avatars to lose clothing items, because the child item arrived before the parent item.

Then you have asset fetching. In Second Life/Open Simulator this is straight HTTP/1. But there are some unusual tricks. Textures are stored in progressive JPEG 2000. It's possible to open a connection and just read a few hundred bytes to get a low-rez version. Then, the client can stop reading for a while, put the low-rez version on screen, and wait to see if there's a need to keep reading, or just close the connection because a higher-rez version is not needed. The poor server has to tolerate a large number of stalled connections. Worse, the actual asset servers on AWS are front-ended by Akamai, which is optimized for browser-type behavior. Requesting an asset from an Akamai cache results in fetching the entire asset from AWS, even if only part of it is needed. There's a suspicion that large numbers of partial reads and stalled reads from clients sometimes causes Akamai's anti-DDOS detection to trip and throttle the data flow.

So those are just some of the issues "the HTTP of VR" must handle. Most are known to MMO designers. The big difference in virtual worlds is there's far more dynamic asset loading. How well that's managed has a strong influence on how consistent the world looks. It has to be constantly re-prioritized as the viewpoint moves.

(Demo, from my own work: https://vimeo.com/user28693218 This shows the client frantically trying to load the textures from the network before the camera gets close. Not all the tricks to make that look good are in this demo.)

It's not an overwhelmingly hard problem, but botch it and you will be laughed off Steam.

1 comments

This is why I think it's a joke to be building metaverse apps in Unity. Unity and dynamic asset loading are not happy bed fellows.

There's not a lot I liked about Unity when I was working with it full-time a few years ago. But the one thing I could acknowledge that it has that was generally missing from open source web development was the asset pipeline. But dynamic, user-uploaded assets won't be able to use the asset pipeline. So one of the biggest drivers for using Unity goes right out the window.

Unity and dynamic asset loading are not happy bed fellows.

Not Unreal Engine 4, either. UE5 has "asset streaming" and "open worlds", but mostly static and loaded from a local SSD on a Playstation 5. That's working nicely.

Asset management from the network is the real difference with seamless, modifiable virtual world systems. Otherwise, it's a minute of "...LOADING..." when you move to the next area. You need clients, servers, file formats, and protocols designed for it. It's a moderately hard engineering problem, and, as yet, there are no good off the shelf solutions.

There's a "check out, check in" approach. Decentraland uses that. You check out your parcel into a local Unity environment, edit, and check in the whole parcel to make it visible to others.

The Spatial OS people, Improbable, did some of this, but their solution cost so much to operate server side that all four of the games that used it went broke. So Improbable is trying to pivot to military simulation.

Probably by UE6 this will all be standard. It's one of those things that has to be done to move the metaverse from hype to usefulness.

But why does it have to load dynamically, when you can just package releases instead?
For Second Life at least, it's hard to package stuff. World geometry, player models, lighting and props can all be edited in realtime. You can't use most of the tricks that video game devs are used to: no simple way to prebake lighting or reflections, no occlusion culling or binary space partition. The asset base is enormous: hundreds or thousands of terabytes. You can bet that every player character will have a completely unique set of textures and models, very little is shared between characters. This is why long time SL users tend to stick to one location, because actually exploring is painful-- move 100 meters, then stand still for a few minutes as your framerate plummets and everything slowly loads in. Many clubs or hangout spaces are on islands or boxes in the sky high enough that nothing else loads in. This reduces the problem just to rendering avatars, which is hard enough by itself.

It's also very hard to depreciate old systems. The game is 18 years old, but there were real money transactions from day one. If a user bought something ten years ago, they expect to still be able to use it!

You can imagine a clean-sheet design that does away with all this. Make the world static, player avatars pre-baked. Most SL competitors do this. (Facebook Spaces, Playstation Home etc) This gives you better FPS and a much more consistent aesthetic, since all the assets are made in-house. But now what? The classic Metaverse problem, there's just not that much to do. Cutting features from SL makes this worse, not better.

Which is why I'm working on a multi-threaded Second Life / Open Simulator client in Rust. The frame rate problem can be overcome. Another example of mine:

https://vimeo.com/640175119

Runs 55-60FPS in a crowded area.

Over-complicated avatar clothing is a separate problem. This is a huge deal in Second Life, because it's the greatest dress-up virtual world ever built. Users expect a lot. The tattoo layer has to show through mesh stockings, for example. I have some ideas on speeding that up but haven't implemented anything. What's needed is an optimization step that takes place when an avatar changes clothes. All the layers of meshes need to be crunched down to a simplified game-type combined mesh. In a game, that would be done during asset building. In a world where you can change clothes, mixing and matching items, it has to be done somewhere near run time. But not on every frame, just at clothing changes. This is already done for textures; all the texture clothing layers of an avatar are baked down to one. Roblox does something like this in their experimental mesh avatar system.

There's a mindset that this is impossible, shared by the low end of metaverse developers. It's not shared by Roblox or Epic or IMVU, who are busy solving the problem. This is a moderately hard problem, but it's not impossible.

Linden Lab staff had convinced themselves that it was impossible to speed up the viewer. After some people in Linden Lab management saw what I'd done, somehow there suddenly was much more effort going into improving the viewer FPS in their C++ client.

I think that the open part of the metaverse needs to be code and legal before aestetic. Mostly because skin mesh animation is the hardest part of game dev. because of pipeline costs and code complexity.

The Roblox solution looks terrible; bloated, bug prone and poor performing. I manage 2000 non-instanced characters (each with a interchangeable weapon) on a 1050Ti.

I'm going with royalty free assets that I happen to have a custom agreement to because of an early adopter advantage which allows me to redistribute them not only in a game but also with my open-source engine.

The content being editable creates headaches and decreases improvement of technology since more time is wasted on supporting more and more data for zero gameplay benefit.

Physics is the gameplay changer for MMO games, and it needs to be implemented in the game and not in some external library to be efficient/specialized enough to scale in a MMO setting.

Because, in a real metaverse, everybody can make changes to their own stuff. That's the difference between a metaverse and an MMO.
I'm actually really curious to see how streaming video might work for things like light maps. You could have a beast of a machine, or a cluster, performing real-time raytracing on light and environment maps that then get streamed to the user. Sort of a hybrid approach between on-device rendering and remote gaming systems like Stadia where all the rendering takes place remotely. I think Ben Nolan was working on something like this for CryptoVoxels, but I stopped following him when he went full NFT crazy.
How much to do remotely is a big issue. A big problem with local rendering is that you need more bandwidth to the client than you'd need for video. A big advantage of cloud gaming is that you're in a data center, close to the asset servers with many gigabits of bandwidth.

The big problem with remote rendering is that it costs too much. "Cloud gaming" startups have appeared and disappeared for years now. If they charge too little, they go broke, and if they charge too much, users leave. NVidia cloud gaming is currently $10/month for 6 hour sessions. So is Stadia, now. That's not too bad, but it may be a loss leader. NVidia already doubled their price once. Startups with similar offerings are charging around $45/month.

You can do some level of dynamic asset loading. The real issue to get around in Unity is dynamic script loading. There's some progress being made with Unity's new visual scripting system. The visual scripts are stored as assets.
There are some problems even with dynamic asset loading in Unity that make creating a very smooth, very clean experience very difficult to achieve.

1) Asset bundles are the Unity "favored" means of supporting dynamic content. However, they are extremely heavy and have to be authored in the Unity editor. So you could do things like release new levels for an offline game, or new environments and items for a MMO, but user-generated content gets really hard to do. It's possible, with running the Unity editor headless, but that's so fraught with peril that it really shouldn't be considered.

2) Primitive, binary assets like textures and audio tracks are the easiest thing to load over the 'net in Unity, but last I checked, decoding them was still implemented on the UI thread. The download itself happens off-thread, but you'll have too large of a performance hit for devices like the Oculus Quest 2 with even a few textures: you will drop frames all over the floor. It's so bad that I had to find, fix, and compile in a full C# implementation of JPEG just to support dynamic texture loading without dropping frames on HoloLens 1, Oculus Go, and Quest 1. I quit using Unity by the time Quest 2 came out, but it's not so much more powerful that it would move the bar far enough.

3) Again, for primitive assets, the raw, decoded data may not be the final format that you want. To use memory efficiently, there are compressed texture formats that are supported directly in GPUs. Surprise, surprise, there is no 100% cross-platform format, so tool's like Binomial's Basis can transcode between formats. This is built into Unity's asset pipeline; if you start life with a PNG file for your image, statically loaded in your Unity scene, it will get transcoded into whatever compressed format the graphics APIs that your target operating systems support. Hence part of Unity's need to have target platforms specified.

4) For 3D models, you need to figure out where you want to lie on the spectrum of small, network-transmission friendliness vs. ease of parsing. That model will then need to be converted to Unity GameObjects and Meshes, which again, takes place on the UI thread. I know of no workaround for this, other than blanking the user's view out to black just before the object creation happens so they don't see the dropped frames.

If all you're making is a card game on smartphones, nobody is going to notice dynamic asset loading causing dropped frames because your "loading" screen isn't tied to their face. But in VR, it's basically table stakes, and Unity makes half of it very hard and the other half impossible.

It's certainly not easy. I was just saying they've made a bit of progress. Assets still need to be activated on the main thread. You'll never get Awake and Enable off the main thread. But things like deserializing the asset can be done asynchronously now. Assets are loaded to the GPU from the main thread but that can be time slices over many frames.

Of course, I think those improvements are only for asset bundles so if that's a no-go for you then there's not been much progress.