scientific computing and simulation.
I have had need for machines like this - for a project that was very compute heavy but embarrassingly parallel, but also was very "chatty" - I.e updated the data structures a lot during compute
I can't be too specific about it, but if involved creating a very large tree structure and updating, pruning and transversing the tree a lot
If the algorithm is updating and reading a large data structure a lot it's only practical from a speed point of view to hold the whole structure in RAM
Private companies want to do simulations as well, and with this type of solution, you can pretty much run them on demand rather than having to wait in line.
My advisor's company straddles the public / private divide, but we've definitely done some simulations for private clients on NERSC, and I assume we weren't misusing hours allocated for some other purpose.
For this type of problem - it must be done on a single node. The overhead of network communication would have killed the latency requirement - that's why we need a huge machine like this.
The code was written in C
(We also maintain a Hadoop and Cassandra cluster, and I use Spark for distributed computation - but those are different projects)
I can't be too specific about it, but if involved creating a very large tree structure and updating, pruning and transversing the tree a lot
If the algorithm is updating and reading a large data structure a lot it's only practical from a speed point of view to hold the whole structure in RAM