Hacker News new | ask | show | jobs
by espadrine 623 days ago
That is a concern that is shared with ReLU. But since the weights are shared across the context/minibatch, perhaps that would not be an issue, similar to ReLU.