|
|
|
|
|
by WhitneyLand
925 days ago
|
|
In case it’s confusing for anyone to see “weight” as a verb and a noun so close together, there are indeed two different things going on: 1. There are the model weights, aka the parameters. These are what get adjusted during training to do the learning part. They always exist. 2. There are attention weights. These are part of the transformer architecture and they “weight” the context of the input. They are ephemeral. Used and discarded. Don’t always exist. They are both typically 32-bit floats in case you’re curious but still different concepts. |
|
Oh well... it seems like it's more confusing than I thought https://www.merriam-webster.com/wordplay/when-to-use-weigh-a...