Hacker News new | ask | show | jobs
user: smaddrellmander
created: 2026-04-21
karma: 75

idlemachines.co.uk

submissions:

The annotated PyTorch training loop
39 points | 9 comments
Mae vs. MSE: more than just the mean vs. median debate
1 points | 0 comments
DiffusionGemma: Discrete diffusion in a large language model
3 points | 1 comments
Heaven knows I'm perplexed now
2 points | 0 comments
Reading MAI's efficiency gain. How to pick architectures like serious people
9 points | 0 comments
MAI-Thinking-1: Building a Hill-Climbing Machine [pdf]
3 points | 0 comments
Are contrastive losses just cross entropy all along?
2 points | 0 comments
Every token, everywhere, all at once
2 points | 0 comments
0 points | 0 comments
The cut in the Mixture of Experts compute graph
1 points | 0 comments
DeepSeek V4 from the Inside
2 points | 0 comments
Softmax, can you derive the Jacobian? And should you care?
134 points | 47 comments
Gemma 4 is not your standard transformer
2 points | 0 comments