User: starzmustdie | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

user: starzmustdie
created: 2022-03-18
karma: 7

submissions:

0 points | 0 comments

Show HN: #1 On This Day

18 points | 1 comments

A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)

1 points | 0 comments

0 points | 0 comments

Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning

1 points | 0 comments

0 points | 0 comments

Show HN: Word Game Bench – evaluating language models on word puzzles

1 points | 0 comments

Show HN: Answers to Chip Huyen's ML Interview Questions

3 points | 0 comments