Hacker News new | ask | show | jobs
by zingelshuher 811 days ago
Question, have you seen the improvement after adding the noise? I mean in practice. Asking because intuition sometimes doesn't work.
1 comments

Quite honestly not in my experiments. I wanted to do some Bayesian hyperparameter optimization with some discretized options like noise/no-noise and n_expert/top_k but haven't been able to find the time or free time in one of our GPU clusters. I plan on using perplexity as this is not yet instruction fine tuned.