Hacker News new | ask | show | jobs
by strayptr 3939 days ago
How expensive would an application like this be? I don't know much about this, but I'd like to learn.

What's an algorithm for figuring out a pretty good guess of any random application's hosting costs? Also, is there a way to figure out how large the expenses could become over time? Is there some way to relate number of users to cost?

There's no upper bound for how much money one could spend, but let's use the midpoint between "extremely frugal" and "money isn't really a concern."

1 comments

Hosting on GCE costs about $1700/month right now, and at this moment we have $490/month in recurring revenue (we have 56 subscribers to various plans). I've put much work into making SMC more efficient, in order to bring the hosting price down, but there are limits. The reason it costs this much are: (1) there are often about 500 users signed in, every user is using at least one Linux account, and what users do is often very computationally and memory intensive (mathematics, number crunching, etc.), (2) I snapshot and backup all files both to Google Cloud Storage and also copy backups offsite. Doing offsite backups mainly costs bandwidth -- I spent about $20 in the last 3 days on downloading offsite backups of user data (to a USB drive on my desk). (3) In addition to compute nodes, there are database and web servers, which are redundant so that two can go down and things still work; this is very important since teachers often give lectures from SageMathCloud or run computers labs, so downtime is very bad. (4) I also snapshot all the disks images regular, which costs more, but reduces the chances of data loss. I care that users don't lose their data in case of a disaster (hackers or lightning striking Google four times), which just makes things cost more.
How important is your tool to your users?

    teachers often give lectures from SageMathCloud
    or run computers labs
OK, sounds important.

    I care that users don't lose their data in case
    of a disaster (hackers or lightning striking
    Google four times), which just makes things cost
    more.
And it sounds like you care. So, how much do you charge for such an important tool?

    we have $490/month in recurring revenue (we have
    56 subscribers to various plans)
$8.75/month. Try tacking on an extra zero to all of your plans. Or, better yet, tack on an extra zero and ALSO let your customers decide whether or not they care about things like backups.

I don't want to sound like an asshole, but your business is never going to succeed if you keep going down this path. And to be clear: I want to see you succeed.

Here are a few things by patio11 you should go read right now:

http://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/

https://training.kalzumeus.com/newsletters/archive/saas_pric...

http://www.kalzumeus.com/2012/08/13/doubling-saas-revenue/

At ShareLaTeX (https://www.sharelatex.com), our hosting costs are around $1500/month (can probably be doubled once you add in backups and other supporting services). This is for a similar service (LaTeX is just subset of what SageMathCloud does, but a resource heavy subset). However, ShareLaTeX handles orders of magnitude more traffic as far as I can tell. One of the big factors in a service like this is being able to get the cost-per-user down low enough that it's a viable business model given that a student/academic is not going to pay more than about $10/month and most won't pay at all.

One of the big wins for us has been using Docker to isolate projects. Sure, each project is resource heavy when run/compiled/executed, but if you have lots of users, they're probably not all resource heavy at the same time. The more lightweight the virtualisation/containers, the more they can share resources. It sounds like maybe each user is getting to hold on to too many resources that they aren't using, and so it's costing an order of magnitude more than if they could share all the resources perfectly?

I'd be happy to chat more about this stuff (almost all of the ShareLaTeX code is open source as well, except for the enterprisy stuff). We've also got a new project called DataJoy for Python and R (https://www.getdatajoy.com) which has similar scaling challenges that we've been working on.

The typical usage pattern we have is somebody interactively using a SageMath worksheet over the course of an hour or two. Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation. Also, people will often open many worksheets, which spawn numerous processes. We use fork for Sage processes to keep down resource usage (maximizes shared memory). Each project is not in its own VM; instead we use cgroups extensively (similar technology that Docker uses under the hood) to control resource usage. All the CPU/memory of the free computers is typically maxed out, and being shared (controlled by cgroups) fairly between users. cgroups is awesome technology.
> Sage uses a lot of memory (large matrices, plots, etc.), and the state must be maintained in memory during the course of the calculation.

1. I thought sage used a ton of RAM partly because of the huge amount of statically linked libraries. I see you said you're using fork to maximize shared memory. Have you tried KSM (Kernel Samepage Merging)?

2. Have you looked at zram? Certain matrices and such may be easily compressible.

Thanks -- these are both great ideas; I've opened a ticket: https://github.com/sagemathinc/smc/issues/93
From what you describe, shouldn't pricing be higher if users are costing you so much?
What do you mean 56 subscribers, 500 people are using the app concurrently?

Are the rest unpaid subscriptions? Or do the subscribers have numerous users?

At this moment there are 585 people connected to SMC (a bit higher than usual due to Hacker New effect), and most are using it for free. We only introduced a fully automated paid plan about 10 days ago, and many of our sign-ups have been in the last week. Paying customers get enhanced support, the ability to upgrade project quotas, and can ask (it's not yet automated) to have projects moved to members-only servers. The members-only servers have an order of magnitude less users on them. There were also until recently obstructions to charging users due to IP and other issues involving University of Washington (my employer).
It would be great to get these paid users writing a line for SMC expenses into their NSF grants. For instance, you could charge $400 for a year's worth of supported SMC for a group (PI and her grad students, postdocs), or $200 for an REU group SMC. It may be easier to get money by asking for a rather larger amount up front, that people plan into their grants or get departments to pay for, rather than asking for $9/month, which I'd feel compelled to pay personally because the hassle of getting reimbursed $9/month is more than the 3 lattes it costs me.
We now offer $79/year and $499/year plans, which would fit perfectly the model you describe. We only started offering them a few days ago due to demand.
You REALLY need to be charging more that the ~$9 a month you are charging.

~$20 seems reasonable. Cheap even. You can always discount when you get to a scale that sustains.