| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bradhilton 470 days ago

No meaningful changes to the hyperparameters, just changed the tasks per iteration to 16 and trained on the same first 16 training tasks each iteration.

We only tested this with the 14B model. You can see the run here:

https://wandb.ai/bradhilton/rl-experiments/runs/062

Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples.

1 comments

pama 470 days ago

Thanks.

link