Hacker News new | ask | show | jobs
by faramarz 798 days ago
it's not about a single point encapsulating a novel, but how sequences of such embeddings can represent complex ideas when processed by the model's layers.

each prediction is based on a weighted context of all previous tokens, not just the immediately preceding one.

1 comments

That weighted context is the 12228 dimensional vector, no?

I suppose that when you each element in the vector weighs 16 bits then the space is immense and capable to have a novel in a point.

GPT-4 is configurable up to 96 layers, each running their own embeddings. I think it was a business choice to afford the compute while they scale.
But if I understand correctly, GPT-4 reduces that to a 1536-dimensional vector. Roughly 1/8th. It's counterintuitive to me.