| I spent a couple months hacking on a dreambooth product that let users train a model on their own photos and then generate new images w/ presets or their own prompts. The main costs were: - gpu time for training - gpu time for inference - storage costs for the users' models - egress fees to download model I ended up using banana.dev and runpod.io for the serverless gpus. Both were great, easy to hook into, and highly customizable. I spent a bunch of time trying to optimize download speed, egress fees, gpu spot pricing, gpu location, etc. R2 is cheaper than s3 - free egress! But the download speeds were MUCH worse than s3 - enough that it ended up not even being competitive. It was frequently cheaper to use more expensive GPUs w/ better location and network speeds. That factored more into the pricing than how long the actual inference took on each instance. Likewise, if your most important metric is time from boot to starting inference then network access might be the limiting factor. |
https://www.banana.dev/blog/sunset