|
|
|
|
|
by avi_vallarapu
773 days ago
|
|
Theoretically this sounds great. I would worry about scalability issues with the Bayesian learning models practical implementation when dealing with the vast parameter space and data requirements of state of the-art models like GPT-3 and beyond. Would love to see practical implementations on large-scale datasets and in varied contexts. I Liked the use of Dirichlet distributions to approximate any prior over multinomial distributions. |
|