Once this becomes widespread, I can actually start using Matrix as a real Discord / Slack alternative (communities just weren't cutting it, unfortunately).
Once the audio chat feature is seamlessly integrated, I even consider ditching my old mumble server for it. Or not. Probably not. Our group just needs audio, with the occasional picture flood, until they get it out of their system. Since some (me include) won't use Discord due to privacy concerns, I still hope for a good and reliable self hosted alternative. A more modern user interface sure would be nice.
Great to hear! I'd love to be able to recommend Matrix/Element as a Discord alternative.
However, actually competing with Discord will require a tonne of hard work in Element. The little things like automatic voice activation, echo cancellation and volume normalization are the reasons Discord are as fantastic as it is, so Element will need to be excellent with those to compete.
The first cut will be using Jitsi as the engine, albeit with more appropriate UI, which gives you AEC (echo cancellation) from WebRTC and some level of AGC (normalisation). But we have plans to go far beyond that, and are very aware it’s hard work. However, pre-Matrix, the core team professionally built VoIP stacks, so we have some experience here (and our own WebRTC implementation should we need it :)
The main reason to consider something different to Jitsi is to directly use Matrix for decentralised e2ee signalling to manage the media streams, and allow hybrid SFU and MCU models (like hangouts or zoom) rather than pure SFU like Jitsi. We do like Jitsi though and already contribute directly and indirectly - Jitsi’s E2EE is derived from Matrix, and the Matrix community just contributed a tonne of a11y fixes to it that just landed. But we’d still like a fully decentralised Matrix-native group call solution eventually.
If you do AGC, you'll already be more usable than Discord, who apparently refuse to implement this. Whenever we're voice chatting in my group, it's extremely annoying that one person is quiet as a whisper and the other is AT 200% VOLUME AND CLIPPING.
The risk on AGC is that unless combined with good voice activation you can end up amplifying background noise and then deafening people when they start speaking. But yup, we'd definitely want to do this.