Hacker News new | ask | show | jobs
by etaioinshrdlu 1807 days ago
This is nice, and I've wanted this kind of thing repeatedly over the last 5 years! I think you often want to run little bits of CPU-based code in addition to your deep learning graph. So I think a better deployment model might be basically Lambda but with CUDA access... or something like that.

The factors that I think would make this service most valuable are low cost (think, lower than GPU's on AWS or similar, even at scale), high burst capability from cold start (1000QPS is a good target), and of course low cold start delays (< 1s, or .5s).

This led me down a rabbit hole in years past and the technical solution seems to be generally, the ability to swap models in and out of GPU ram very quickly. Possibly using NVIDIA's unified memory subsystem.

1 comments

Thank you!

We don't have any cold start delay! In our custom environment, you can do exactly what you are describing (running both CPU and GPU code). We provide you with access to the GPU and the CUDA libraries installed. It's basically lambda (minus the cold start) with GPU access.

We can scale a lot very quickly depending on how much you need.

That's impressive!

Are you willing to talk a bit about how this all works? I assume you host the hardware yourself somewhere, which in the days of AWS et al must be pretty tough to pull off, especially with these specs. Where do you get the hardware from these days with the crypto craze?

Yes! A more in-depth blog post is coming soon. We do host the hardware ourselves, for complete control over the GPUs. We found a great infrastructure provider that is also experiencing shortages.