The processes can communicate with processes on other servers in exactly the same way that they can communicate with processes on the same server.
If you have one traditional application server and want to do some form of cross-user interaction - let's say chat - you can do that trivially, put it in a queue for that user in a global map. Now when you outgrow that server, you need to rewrite all your code to understand the concept of users being on other servers or use an external message queueing system.
In Erlang, all of this is built in by default - if you write code for the OTP framework (the standard library for dealing with messaging, process supervision, etc), all you need to do is connect the two servers together and point them at the same shared user->process mapping process (which you have to build whether you're dealing with one server or 20, as there's no global data otherwise).
Of course, if you have absolutely no direct interaction between users, it's trivial to scale anything - fire up a new server and direct some portion of your traffic at it. Erlang's trick is to make it that easy even when you do have direct interaction between users. And of course that works for even backend workloads - if you have a backend server that your frontend servers talk to and need to scale it, if you've coded in Erlang and put as much logic as possible in per-connection processes, you're probably a significant chunk of the way there.
You can't use traditional servers to scale in lots of cases without a lot of additional development work. You can use Erlang servers to do so. Therefore, they're not the same - Erlang covers a broader range of use cases.