| HN Mirror

So modal.com is "turning-the-vm-off-when-unused-as-a-service" :-)

I ran research/open_llama_7b_preview_200bt on there, using they python example, with A10G gpu.

Cost 2-3c per run, taking ~20 seconds each time, on fairly small prompts. So about the same as GPT-4?

Now this is a non expert just playing, it probably can be optimized by trying different GPUs and optimizing the code somehow.

I don't think you are using these models to save money, but you might be using them for tunability, privacy, mobility [1], secrecy or fun/research.

[1] in other words you want to build a robot that can work disconnected from the internet.