|
|
|
|
|
by valine
930 days ago
|
|
The coding process. I've been experimenting with fine tuning methods where I freeze various layers or use different loss functions for attention vs feed forward. It's random little things like the names of layers that trip me up. For example the attribute that holds the name of the activation function in mistral is called hidden_act where in llama it's called activation_function. |
|