Hacker News new | ask | show | jobs
by avisoori1x 812 days ago
Quite honestly not in my experiments. I wanted to do some Bayesian hyperparameter optimization with some discretized options like noise/no-noise and n_expert/top_k but haven't been able to find the time or free time in one of our GPU clusters. I plan on using perplexity as this is not yet instruction fine tuned.