|
|
|
|
|
by quickthrower2
1134 days ago
|
|
So modal.com is "turning-the-vm-off-when-unused-as-a-service" :-) I ran research/open_llama_7b_preview_200bt on there, using they python example, with A10G gpu. Cost 2-3c per run, taking ~20 seconds each time, on fairly small prompts. So about the same as GPT-4? Now this is a non expert just playing, it probably can be optimized by trying different GPUs and optimizing the code somehow. I don't think you are using these models to save money, but you might be using them for tunability, privacy, mobility [1], secrecy or fun/research. [1] in other words you want to build a robot that can work disconnected from the internet. |
|