Hacker News new | ask | show | jobs
by jiayq84 987 days ago
In theory one can have 640G = 8 * 80G A100s memory and launch it. 180B Falcon with fp16 will be 360G, so there would be enough memory. It's definitely going to be very expensive indeed.
1 comments

Llama.cpp can run quantized Falcon on a top end Mac Studio, which is only five grand: https://twitter.com/ggerganov/status/1699791226780975439

If I'm paying a third party a hundred bucks a month, I'd at least want them to be able to match the capacities of consumer hardware.