Hacker News new | ask | show | jobs
by baq 946 days ago
Mind blown. Sounds almost too good to be true except the human brain runs on 20W and this brings us to the same ballpark. This was hard scifi a year ago!

Can an approach like this be integrated into stuff like llama.cpp so I could have a 200B model hashed down to 7B to run on civilian hardware or even a CPU?

1 comments

I'd expect that on a regular CPU, the RAM access latency will destroy any performance improvements. This work is much better suited for FPGAs or ASICs.