Hacker News new | ask | show | jobs
by boulos 2173 days ago
Disclosure: I work on Google Cloud (but my advice isn’t to come to us).

Sorry to hear that. I’m sure it’s super stressful, and I hope you pull through. If you can, I’d suggest giving a little more information about your costs / workload to get more help. But, in case you only see yet another guess, mine is below.

If your growth has accelerated yielding massive cost, I assume that means you’re doing inference to serve your models. As suggested by others, there are a few great options if you haven’t already:

- Try spot instances: while you’ll get preempted, you do get a couple minutes to shut down (so for model serving, you just stop accepting requests, finish the ones you’re handling and exit). This is worth 60-90% of compute reduction.

- If you aren’t using the T4 instances, they’re probably the best price/performance for GPU inference. If you’re using a V100 by comparison that’s up to 5-10x more expensive.

- However, your models should be taking advantage of int8 if possible. This alone may let you pack more requests per part. (Another 2x+)

- You could try to do model pruning. This is perhaps the most delicate, but look at things like how people compress models for mobile. It has a similar-ish effect on trying to pack more weights into smaller GPUs, or alternatively you can do a lot simpler model (less weights and less connections also often means a lot less flops).

- But just as much: why do you need a GPU for your models? (Usually it’s to serve a large-ish / expensive model quickly enough). If you’re going to be out of business instead, try cpu inference again on spot instances (like the c5 series). Vectorized inference isn’t bad at all!

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Remember, your users / customers won’t somehow be happier when you’re out of business in a month. Making all requests suddenly take 3x as long on a cpu or sometimes fail, is better than “always fail, we had to shut down the company”. They’ll understand!

2 comments

I was in the same boat and this is good advice!

I stopped using gpu's, "Vectorized inference isn’t bad at all!". This soo much, I was blinded with gpu speed, using tensorflow builds with avx optimization is actually pretty fast.

My discovery:

+ Stop expensive GPU's for inference and switch to avx optimized tensorflow builds.

+ Cleaned up the inference pipeline and reduced complexity.

+ Buying compute instance for a year or more provides a discount.

- I never got pruning to work without a significant loss increase.

- Tried spot instances with gpu's that are cheaper. Random kills and spinning up new instances took too long loading my code. The discount is a lot, but I couldn't reliable get it up. Users where getting more timeouts. I bailed and just used cpu inference. The gpu was being underutilized, using cpu only increased the inference to around 2-3 seconds. With the price trade off it was a more simpel,cheaper and easier solution.

Also, consider physical servers from providers like Hetzner. These can be several times cheaper than EC2.
I use Hetzner for quite a lot for personal projects and can recommend them for reliability and predictable costs. I've done reasonably high CPU tasks like compiling Android images on the larger Cloud instances.

However, this morning I was playing around with Scaleway bare metal [1] and General Purpose instances [2] -- I am thinking of making a switch for high CPU tasks.

[1] https://www.scaleway.com/en/bare-metal-servers/

[2] https://www.scaleway.com/en/virtual-instances/general-purpos...

Interesting! These look very good indeed. I will have to try them.

The main point is that physical servers are much cheaper than VMs and provide significantly better performance as well (see my benchmarking and comparison: https://jan.rychter.com/enblog/cloud-server-cpu-performance-...).

I was just looking at Hetzner yesterday, looking to host a HA Postgres setup.

Their block storage volumes look interesting, but I couldn't find any information on performance guarantees, or even claims.

Anyone have an idea about performance (IOPS or MB/s)?

I use them but don't have that info off the top of my head. However, you can easily make an account, get a VPS with a volume and benchmark it in a few minutes for a few cents.
Note that we are talking about two different things here: a VPS is not the same thing as a dedicated server.

I only use their dedicated servers with NVMe SSDs and have never benchmarked the I/O.

Right, but the GP was talking about the network volumes AFAICT.
I worked on an unrelated market study - look at Upcloud and Raptr as well.
Oh and I should have said why they shouldn’t bother attempting to migrate somewhere “cheaper” (whether GCP, Hetzner, or whatever else): it doesn’t sound like they have time. I read the call for help as: we need something we can do in the next week or two to keep us in business. Any “move the infrastructure” plan will take too long and you should still do the “choose the right GPU / CPU, optimize your precision” change no matter what.