There are definitely GPU providers where you can buy cheaper L40S hours than us. I'm not entirely sure what their system architectures are, or whether they're just buying in absolutely spectacular volume, because we are cutting pretty close to the bone with our pricing.
One cost factor we have that other providers might not have (I'd love to know): we have to dedicate individual racked physical hosts to each group of GPUs we deploy, because we don't (/can't, depending on how you think about systems security) allow GPU-enabled workloads to share hardware with non-GPU-enabled workloads, and we don't allow anyone to share kernels.
But like we said in the post: we're still figuring this stuff out. What we know is: at the same price level, we're consistently sold out of A10 inventory.
Hadn't heard of vast.ai before and looked into it. The prices seem really good. Then saw "Our software allows anyone to easily become a host by renting out their hardware."
Also, vast.ai and fly.io just in general are not apples to apples. Sure, go to vast, get yourself a vm or vps or instance or docker container or whatever instance they are giving you. Do your stuffs. Sure. But that is not even close to the same set of features/infra/platform that fly.io offers is it? I'm not sure why people keep thinking that gpu pricing on fly should be the same as an instance on some generic GPU farm or with vast you could even be getting a slice on some random gamer dude's actual computer. Am I not wrong here?
I don't know what platform vast.ai uses but what I have noticed is cpu compute is pretty slow in those. Specifically the tokenization stage was unusually slow for no apparent reason. Had to give that up and use Google cloud for my research project
Sometimes vast.ai is running GPUs on Fly.io that people with YC credits have spun up and added to their marketplace. Those would have been fast though.
They run on literally anything someone installs their agent on.
One cost factor we have that other providers might not have (I'd love to know): we have to dedicate individual racked physical hosts to each group of GPUs we deploy, because we don't (/can't, depending on how you think about systems security) allow GPU-enabled workloads to share hardware with non-GPU-enabled workloads, and we don't allow anyone to share kernels.
But like we said in the post: we're still figuring this stuff out. What we know is: at the same price level, we're consistently sold out of A10 inventory.