Hacker News new | ask | show | jobs
by whartung 1192 days ago
I honestly am not that familiar with this space. How realistic is it that someone could self-host a ChatGPT instance?

Assuming the model was available, how big are the models and what kind of hardware is necessary to run the instance?

3 comments

OpenAI hasn't published any information about the size or hardware requirements for running ChatGPT. Reading between the lines, the default ChatGPT Turbo model seems to be significantly smaller than GPT-3 (it's a distilled model), but probably still heavier than the Alpaca and Llama 7B models people are running (very slowly) on their single GPU computers this week. You'd probably need multiple A100s to get comparable performance to the ChatGPT API.
Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.
I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?
8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.
All of the llama implementations for Apple are CPU only afaik.
If you run it with 4-bit quantization completely on the CPU (similar to llama.cpp), ChatGPT should run in about 90 GB of RAM. Which is easy to get your hands on for a desktop, but it's out of reach for notebooks.

Also expect performance of couple seconds per token in that setup, for now you need something involving GPUs

I think you’d need 2x A100 GPUs, which is $4.18 an hour on Runpod. If I was super bored I’d probably be willing to drop $50 for 10 hours to mess around with it.

https://www.runpod.io/gpu-instance/pricing