Y
Hacker News
new
|
ask
|
show
|
jobs
by
macleginn
212 days ago
There has been some experimentation with the use of ReLU^2 in language models in recent years, e.g., here:
https://proceedings.neurips.cc/paper_files/paper/2021/file/2...