| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Vecr 1074 days ago
	You can run that model (Wizard-30) on a computer with 64 gigabytes of RAM (or smaller, I don't know how tight you can cut it). You obviously want fast RAM and a good CPU, but you don't need a GPU.

3 comments

snowycat 1073 days ago

I am running 30b llama models (4 bit quantized using llama.cpp) on 32 gb of ram and no GPU. I get around 2 tokens/second.

link

lossolo 1073 days ago

You can also travel on a bike from NY to LA.

link

nobody9999 1073 days ago

>You can also travel on a bike from NY to LA.

You can. In fact, my brother did so a bunch of years ago. He found it to be a wonderful experience that made his life better.

He's also flown on a commercial airplane from NY to LA (as have I, as well as millions of others) and while it got him to Los Angeles, it didn't provide the levels of sensory input, personal interactions and experience that riding his bicycle did.

That's not to say everyone should ride bicycles across the US every time they need/want to make such a trip, but doing so at least once can be a more positive experience than sitting next to some strangers for five hours.

The satisfaction of doing so, or the experiences in interacting with people and the landscape during such a trip aren't quantifiable, but reducing the value of doing so (if I'm missing your point here, my apologies) to the time required to make such a trip is reductive in the extreme IMHO.

Edit: Clarified my prose.

link

Llamamoe 1072 days ago

Love this comment so much. Your brother sounds cool :)

link

logicchains 1073 days ago

Just imagine if Boeing lobbied the government to pass a law banning you biking from NY to LA.

link

ed_mercer 1074 days ago

AFAIK you can get away with a swapfile, no need for large amounts of RAM.

link

dacryn 1074 days ago

wont that nearly kill your ssd if you do it for extended periods of time?

link

ZiiS 1074 days ago

Most of the ram is for storing the model once it is loaded it is read only so will not harm an SSD.

link

Incipient 1074 days ago

It only reads from memory,not swap directly. If it needs to read something from swap, it'll write out something from memory to swap, then read the swap into memory. Reading 1gb of swap, will essentially write 1gb to the ssd too. (rough numbers)

Correct me if I misunderstand swap?

link

mihaic 1074 days ago

If the underlying data hasn't changed, the page isn't written to disk. CPUs do keep track of writes that mark a memory page as "dirty".

link

simcop2387 1074 days ago

That's basically right. I'm not sure if Linux or windows will keep track of the pages it read out of swap to know if they're still there and valid, but there's a better way for this that I think at least ggml supports where it stores a copy of the model unpacked and ready on disk as a cache for doing the work rather than relying on the OS virtual memory to handle it. This should be faster than the OS VMM (though probably not by much) but since it'll know which pieces it needs to leave on disk and where they are it should be much safer as far as writes go since it will know enough to not write multiple times like that.

link