|
|
|
|
|
by freecodyx
1102 days ago
|
|
the main thing about LLM's in my opinion is the tokenization part, words are already clustered and converted into numbers(vectors) it's already a big deal. we are using learned weights,
the attention part feels like a brute force approach to learn how those vectors are likely used together (if you add positional encoding as an additional information). statistics on large amount of amount of data just seems to work after all. |
|