Hacker News new | ask | show | jobs
by avisoori1x 823 days ago
This is a good point. I'm yet to try it as I've kind of let this project sit for a couple of months and only getting back to it. I went with this because it's simpler but I'm not sure simpler is necessarily better in this case.
2 comments

Ah ok, I was wondering if there was some theory here that I wasn't aware of but if it's just experimentation no problem ;) good to know in any case!

I find it a bit difficult to find resources describing the properties of various options for this topic of discrete choices and clustering, apart from a few papers & blogs describing the idea.

Question, have you seen the improvement after adding the noise? I mean in practice. Asking because intuition sometimes doesn't work.
Quite honestly not in my experiments. I wanted to do some Bayesian hyperparameter optimization with some discretized options like noise/no-noise and n_expert/top_k but haven't been able to find the time or free time in one of our GPU clusters. I plan on using perplexity as this is not yet instruction fine tuned.