| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by turkeygizzard 1268 days ago
	I'm pretty sure the GPT model is huge and does not fit on any conventional GPU. Even if they open-sourced the weights, I don't think most people would be running it at home. Also regarding the text limits, AFAIK, there's just an inherent limit in the architecture. Transformers are trained on finite-length sequences (I think their latest uses 4096 tokens). I have been trying to understand how ChatGPT seems to be able to manage context/understanding beyond this window length

2 comments

Sharlin 1268 days ago

I don't think ChatGPT does. I have had long discussions with it, with some rules agreed upon in the beginning, and at some point it clearly begins to forget the exact rules and has to be reminded of them.

(Specifically, AI Dungeon type games where ChatGPT is the DM and the human the protagonist, or vice versa. The most common failure mode seems to be that it forgets whether it's playing the DM or the protagonist. To be fair, it performs admirably well despite the limitations.)

link

rzzzt 1268 days ago

In a previous thread (which I can not find right now) the recommendation was to either ask it to summarize what happened earlier, or do this job yourself from time to time.

link

puffybuf 1268 days ago

I read that it just re-reads the discussion so far every time you submit. So it must hit a limit of what it can remember since they limit the amount of training tokens it can read for a submission.

link

Sharlin 1268 days ago

Yes, I know. It’s a pure function with no mutable state.

link

throwaway2016a 1268 days ago

Is Chat-GTP it's own model? I thought ChatGTP was just GTP-3 with an easier to use interface.

link

Sharlin 1268 days ago

It's based on GPT-3 but is specifically amended to predict sequences that look like coherent dialogue, by an adversarial model that has been partially trained by humans. The resulting model is also quite a bit smaller than the full GPT-3. It's much more difficult to make GPT-3 engage in reasonable dialogue than ChatGPT.

link

madiator 1268 days ago

Yeah it wouldn't fit. GPT3 is 175B params, so even if you use 8 bit for each weight, you need 175×10^9÷2^30 = 163GiB of memory.

link

joshka 1268 days ago

https://www.reddit.com/r/ChatGPT/comments/zhzjpq/comment/izo...

>It's around 500gbs and requires around 300+gbs of vram from my understanding and runs on one of the largest super computers in the world. Sable diffusion has around 6 billion parameters gpt-3/chatgpt has 175 billion.

link

cal85 1268 days ago

Wouldn’t that be possible with about 4 powerful GPUs? Or does it not work like that?

link

taocoyote 1268 days ago

Possibly, but that would be 10 of thousands of dollars worth of GPUs.

link

kurtoid 1268 days ago

Silly question: how does OpenAI host/serve it?

link

magixx 1268 days ago

I think on professional hardware you can get 80G of memory per GPU and they can likely do memory pooling.

link