Hacker News new | ask | show | jobs
by siquick 116 days ago
Rent a H100 on Modal which scales down to zero when not in use - you can set the time out period.

Cold boot times are around 5m but if your usage periods are predictable it can work out ok. Works out at $2 an hour.

Still far more expensive than a ChatGPT sub.

1 comments

Do you have some reference on what setup you're talking about? I'd like to integrate it into my IDE (cursor/vscode) - are there docs on such a setup?
Start here

https://modal.com/docs/examples/vllm_inference

or give this a go

https://modal.com/docs/examples/opencode_server

You get $30 free credits each month on Modal which is enough to play around (i have no affiliation, just think they run a great service)