| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by NinjaTrance 114 days ago

To run Llama 3.1 8B locally, you would need a GPU with a minimum of 16 GB of VRAM, such as an NVIDIA RTX 3090.

Talas promises a 10x higher throughtput, being 10x cheaper and using 10x less electricity.

Looks like a good value proposition.

2 comments

ac29 114 days ago

> To run Llama 3.1 8B locally, you would need a GPU with a minimum of 16 GB of VRAM, such as an NVIDIA RTX 3090

In full precision, yes. But this talaas chip uses a heavily quantized version (the article calls it "3/6 bit quant", probably similar to Q4_K_M). You dont even need a GPU to run that with reasonable performance, a CPU is fine.

link

lm28469 114 days ago

What do you do with 8b models ? They can't even reliably create a .txt file or do any kind of tool calling

link

joquarky 114 days ago

Exploration, summarization, classification, translation

link