Hacker News new | ask | show | jobs
by jpallen 3939 days ago
At ShareLaTeX (https://www.sharelatex.com), our hosting costs are around $1500/month (can probably be doubled once you add in backups and other supporting services). This is for a similar service (LaTeX is just subset of what SageMathCloud does, but a resource heavy subset). However, ShareLaTeX handles orders of magnitude more traffic as far as I can tell. One of the big factors in a service like this is being able to get the cost-per-user down low enough that it's a viable business model given that a student/academic is not going to pay more than about $10/month and most won't pay at all.

One of the big wins for us has been using Docker to isolate projects. Sure, each project is resource heavy when run/compiled/executed, but if you have lots of users, they're probably not all resource heavy at the same time. The more lightweight the virtualisation/containers, the more they can share resources. It sounds like maybe each user is getting to hold on to too many resources that they aren't using, and so it's costing an order of magnitude more than if they could share all the resources perfectly?

I'd be happy to chat more about this stuff (almost all of the ShareLaTeX code is open source as well, except for the enterprisy stuff). We've also got a new project called DataJoy for Python and R (https://www.getdatajoy.com) which has similar scaling challenges that we've been working on.

1 comments

The typical usage pattern we have is somebody interactively using a SageMath worksheet over the course of an hour or two. Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation. Also, people will often open many worksheets, which spawn numerous processes. We use fork for Sage processes to keep down resource usage (maximizes shared memory). Each project is not in its own VM; instead we use cgroups extensively (similar technology that Docker uses under the hood) to control resource usage. All the CPU/memory of the free computers is typically maxed out, and being shared (controlled by cgroups) fairly between users. cgroups is awesome technology.
> Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation.

1. I thought sage used a ton of RAM partly because of the huge amount of statically linked libraries. I see you said you're using fork to maximize shared memory. Have you tried KSM (Kernel Samepage Merging)?

2. Have you looked at zram? Certain matrices and such may be easily compressible.

Thanks -- these are both great ideas; I've opened a ticket: https://github.com/sagemathinc/smc/issues/93