Hacker News new | ask | show | jobs
by Imnimo 1120 days ago
>Does the n-gram model really need all those parameters to mimic GPT-4? Yes, it does.

I don't understand what this argument is supposed to demonstrate. Obviously you can compress the 8000-gram model that GPT-4 represents - GPT-4's weights are proof!

2 comments

That's right, but if you did that compression, it wouldn't be an n-gram anymore. What I'm attempting to get across is that you could model GPT-4 as an equivalent 8000-gram in an abstract sense, but that's not a good mental picture for how it actually functions. Internally, GPT-4 is no more an 8000-gram than Stockfish is a giant lookup table of chess positions. GPT-4 is learning RASP programs, not statistical text correlations.
Does ChatGPT really represent an 8000 gram model? Seems the claim was that it just predicts the next word !