Hacker News new | ask | show | jobs
by valine 930 days ago
This is a really neat idea. Would love to have a similar view for llama like models. I’ve been working with Mistral 7B lately and it’s annoying how many small changes there are between it and llama. Having a view like this would be a good time saver.
1 comments

Thanks for your comment. Will definately try to work on that too! Quick question: why does the differences between Mistril and llama? Do you mean it can save time for reading the paper? Or actually during the coding/building process?
The coding process. I've been experimenting with fine tuning methods where I freeze various layers or use different loss functions for attention vs feed forward. It's random little things like the names of layers that trip me up. For example the attribute that holds the name of the activation function in mistral is called hidden_act where in llama it's called activation_function.