Hacker News new | ask | show | jobs
by tyfon 1165 days ago
I just bought another 64 for my computer that will arrive in the mail after easter. That will allow me to run the full 65B FP16 model, however it will probably be much slower than the 4 bit quantized version as it has to do more math.

My biggest hope in the end is that we will get a library that can utilize the unified memory model of the AMD platform and run these things with a combination of system ram and GPU. I think intel also has something similar in their platforms.

Not sure how well llama.cpp runs with 8 cores and so big weights though, I am really pushing how usable it is due to speed with my 5950x already.

Perhaps we'll even get dedicated AI boards in the end, much like GPUs today.