Hacker News new | ask | show | jobs
by teruakohatu 1536 days ago
They are but performance is decreased. In many cases transformers are encoding vast amounts of training data within the insane number of parameters.