| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by whartung 1192 days ago
	I honestly am not that familiar with this space. How realistic is it that someone could self-host a ChatGPT instance? Assuming the model was available, how big are the models and what kind of hardware is necessary to run the instance?

3 comments

tottenval 1192 days ago

OpenAI hasn't published any information about the size or hardware requirements for running ChatGPT. Reading between the lines, the default ChatGPT Turbo model seems to be significantly smaller than GPT-3 (it's a distilled model), but probably still heavier than the Alpaca and Llama 7B models people are running (very slowly) on their single GPU computers this week. You'd probably need multiple A100s to get comparable performance to the ChatGPT API.

link

noduerme 1192 days ago

Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.

link

Tostino 1192 days ago

I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?

link

noduerme 1192 days ago

8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.

link

brianjking 1192 days ago

All of the llama implementations for Apple are CPU only afaik.

link

wongarsu 1192 days ago

If you run it with 4-bit quantization completely on the CPU (similar to llama.cpp), ChatGPT should run in about 90 GB of RAM. Which is easy to get your hands on for a desktop, but it's out of reach for notebooks.

Also expect performance of couple seconds per token in that setup, for now you need something involving GPUs

link

wincy 1192 days ago

I think you’d need 2x A100 GPUs, which is $4.18 an hour on Runpod. If I was super bored I’d probably be willing to drop $50 for 10 hours to mess around with it.

https://www.runpod.io/gpu-instance/pricing

link