|
|
|
|
|
by amelius
323 days ago
|
|
No, an LLM really uses __much__ more bits per token. First, the embedding typically uses thousands of dimensions. Then, the value along each dimension is represented with a floating point number which will take 16 bits (can be smaller though with higher quantization). |
|
My point was that you compared how the LLM represents a token internally versus how “English” transmits a word. That’s a category error.