|
|
|
|
|
by mabbo
1144 days ago
|
|
Weights are basically number/float variables. In neural networks, vectors of values are multiplied (or math'd in some way) by weights to get new vectors of values. A 500 billion weight model has 500 billion variables, all carefully chosen via training. A model is some architecture of how data will flow through these weight matrices, along with the values of each weight. Tokens are sort of "words" in a sentence, but the ML may be translating the word itself into a more abstract concept in 'word space': eg, a bunch of floating point values. At least some of what I just said is probably wrong, but now someone will correct me and we'll both me more right! |
|
> A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.
Because data doesn't really flow through weight matrices, though perhaps this is true if you squint at very simple models. Deep learning architectures are generally more complicated than multiplying values by weights and pushing the results to the next layer, though which architecture to use depends heavily on context.
> Tokens are sort of "words" in a sentence
Tokens are funny. What a token is depends on the context of the model you're using, but generally a token is a portion of a word. (Why? Efficiency is one reason; handling unknown words is another.)