It's not a bad design, it's just the reference implementation is python based, and, well, a reference implementation; so they don't do too much for optimisations and focus instead on readability.
When Matrix as a protocol settles more, we'll start seeing optimised versions of the server, I'm certain of it.
The current reference implementation, Synapse (Python) seems to take up around 2GB of RAM, Dendrite (Golang) rn uses up to 100MB according to Matthew (project lead). Note that Dendrite is still a WIP. Currently I don't think anyone is too sure about how Ruma (Rust) would perform as it is still very early in development.
When Matrix as a protocol settles more, we'll start seeing optimised versions of the server, I'm certain of it.