|
|
|
|
|
by going_ham
1092 days ago
|
|
1. The trained model has 7B parameters or weights for each neuron. 2. It can handle upto 8k tokens. Tokens are usually some representation for a word. If your tokens are characters then, "h", "e", "y" represent 3 tokens for hey. Most of the algos use byte pair encoding. For example "hand-le" has two tokens "hand" and "le". This is a very crud example which is enough to give the gist but is not accurate. You can look into byte pair encoding for more details. 3. The token size 1.5T token means they have huge variations for input and output. Simply put, it was trained on large data corpus. I hope this simplifies it. You can research further if you are interested! Hope it helps! |
|
This one doesn't even make any sense. Of course it doesn't have 7B parameters _per_ neuron.