Hacker News new | ask | show | jobs
Annotated Implementation of DeepNet: Scaling Transformers to 1k Layers (nn.labml.ai)
3 points by vpj 1521 days ago