Hacker News new | ask | show | jobs
by avi_vallarapu 773 days ago
Theoretically this sounds great. I would worry about scalability issues with the Bayesian learning models practical implementation when dealing with the vast parameter space and data requirements of state of the-art models like GPT-3 and beyond.

Would love to see practical implementations on large-scale datasets and in varied contexts. I Liked the use of Dirichlet distributions to approximate any prior over multinomial distributions.