Hacker News new | ask | show | jobs
by instance 1130 days ago
I tested on a serious use case and quality was subpar. For real use cases I had to either host the most powerful model you can get (e.g. LLaMA-65B or so) on a cloud machine, which again costs too much (you'll be paying like 500-1000 USD per month), or just go straight for GPT-3.5 on OpenAI. The latter economically makes most sense.
2 comments

what real use case did you use it for?
For instance used it in conjunction with llama-index for knowledge management. Created an index for a whole confluence/jira of a mid-sized company, got good results with GPT, but for LLaMA of this size that use case was too much.
I'd argue 1k per month for mid-sized company is nothing, but I can understand where you are coming from.
Did you try instructor-xl? It ranks highest on huggingface.
Making demos to raise investment probably
What about turning the cloud vm off except when you're actually using it?
So modal.com is "turning-the-vm-off-when-unused-as-a-service" :-)

I ran research/open_llama_7b_preview_200bt on there, using they python example, with A10G gpu.

Cost 2-3c per run, taking ~20 seconds each time, on fairly small prompts. So about the same as GPT-4?

Now this is a non expert just playing, it probably can be optimized by trying different GPUs and optimizing the code somehow.

I don't think you are using these models to save money, but you might be using them for tunability, privacy, mobility [1], secrecy or fun/research.

[1] in other words you want to build a robot that can work disconnected from the internet.

A "serious use case" means it needs to be available around the clock.