| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qumpis 1204 days ago
	I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be? I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.

1 comments

MacsHeadroom 1204 days ago

GPT-4 is the same speed as legacy GPT-3 ChatGPT for me. It's only occasionally slower, which I expect is due to load and not it being larger.

link

qumpis 1204 days ago

Interesting. I remember when the speedup of chat-gpt happened, the API prices dropped by around 10x, so I'd imagine there were some tricks of making them run faster.

If they still haven't implemented these, it would be positively surprising (to me) to see the model run at similar speeds as chatgpt now. It'd be a great achievement if they really packed such performance on similar architecture (say by just training longer)

link

MacsHeadroom 1204 days ago

The speed-up of the free and default "chatGPT" happened because they switched it from the full size GPT-3.5 to "GPT-3.5-Turbo", which is likely a finetune of the 10x smaller GPT-3 Curie.

If you have chatGPT Plus you can choose "Legacy" from the drop-down to get the smarter (and slower) 175B Parameter version of GPT-3.5. That version is the same speed as GPT-4 when load is low (early morning EST), which lends credence to the theory that GPT-4 is the same size as overparametrized GPT-3.

link

qumpis 1204 days ago

Oh this explains a lot, thank you for the information!

link