Hacker News new | ask | show | jobs
by dcl 634 days ago
Llama and its variants are popular for language tasks, https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

However, as far as I can tell, it's never actually clear what the hardware requirements are to get these to run without fussing around. Am I wrong about this?

3 comments

In my experience the hardware requirements are whatever the file size is + a bit more. Cpu works, gpu is a lot faster but needs VRAM.

Was playing with them some more yesterday. Found that the 4bit ("q4") is much worse then q8 or fp16. Llama3.1 8B is ok, internlm2 7B is more precise. And they all hallucinate a lot.

Also found this page, that has some rankings: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_...

In my opinion they are not really useful. Good for translations, to summaries some texts, and.. to ask in case you forgot some things about something. But they lie, so for anything serious you have to do your own research. And absolutely no good for precise or obscure topics.

If someone wants to play there's GPT4All, Msty, LM Studio. You can give them some of your documents to process and use as "knowledge stacks". Msty has web search, GPT4All will get it in some time.

Got more opinions, but this is long enough already.

I agree on the translation part. Llama 3.1 8B even at 4bit does a great job translating JP to EN as far as I can tell, and is often better than dedicated translation models like Argos in my experience.
I had a underwhelming experience with Llama translation, incompatable to Claude or GPT3.5+ which are very good. Kind of like Google translate but worse. I was using them through Perplexity.
Training is rather resource intensive either in time, RAM or VRAM. So it takes rather top end hardware. For the moment, nVidia's stuff works best if cost is no object.

For running them, you want a GPU. The limitation is that the model fits in VRAM or the performance will be slow.

But if you don't care about speed, there's more options.

Yeah llama3.1 is really impressive even in the small 8B size. Just don't rely on knowledge but make it interact with Google instead (really easy to do with OpenWebUI)

I personally use an uncensored version which is another huge benefit of a local model. Mainly because I have many kinky hobbies that piss off cloud models.

The moment Google gets infiltrated by rogue AI content it will cease to be as useful and you get to train it with more knowledge.

It's slowly getting there.

It's been infiltrated by rogue SEO content for at least a decade.
Maybe, but given how good Gemma is for a 2b model I think Google has hedged their bets nicely.