Sorry about the hug of death - while I spent an embarassing amount of money on rented H100s, I couldn't be bothered to spend $5 for Cloudflare workers... Hope you all enjoy it, it should be back up now
Curious if the H100s were strictly for VRAM capacity? Since this is a batch job and latency doesn't really matter, it seems like you could have run this on 4090s for a fraction of the cost. Unless the model size was the bottleneck, the price to performance ratio on consumer hardware is usually much better for this kind of workload.
Would you mind sharing a ballpark estimate?