A 3d version of this concept is well executed by Mozilla Hubs, but the lack of echo cancellation makes headphones a necessity when using Hubs which is a downside.
Echo cancellation works in Firefox, we are landing a fix for chrome this week. It turned out to be quite involved to do spatialized audio and echo cancellation:
A lot of this can be solved with push-to-talk, like what Virtual Airwaves is doing. They simulate radio in that the closer you are to someone, the "stronger" the signal is. (Take a look at their patents). https://virtualairwaves.com/ or try it at https://cb.virtualairwaves.com/ There are usually conversations on Channel 1 each evening, UK timezone.
https://github.com/mozilla/hubs/pull/2361