Y
Hacker News
new
|
ask
|
show
|
jobs
user:
smaddrellmander
created:
2026-04-21
karma:
75
idlemachines.co.uk
submissions:
The annotated PyTorch training loop
39 points
|
9 comments
Mae vs. MSE: more than just the mean vs. median debate
1 points
|
0 comments
DiffusionGemma: Discrete diffusion in a large language model
3 points
|
1 comments
Heaven knows I'm perplexed now
2 points
|
0 comments
Reading MAI's efficiency gain. How to pick architectures like serious people
9 points
|
0 comments
MAI-Thinking-1: Building a Hill-Climbing Machine [pdf]
3 points
|
0 comments
Are contrastive losses just cross entropy all along?
2 points
|
0 comments
Every token, everywhere, all at once
2 points
|
0 comments
0 points
|
0 comments
The cut in the Mixture of Experts compute graph
1 points
|
0 comments
DeepSeek V4 from the Inside
2 points
|
0 comments
Softmax, can you derive the Jacobian? And should you care?
134 points
|
47 comments
Gemma 4 is not your standard transformer
2 points
|
0 comments