Hacker News new | ask | show | jobs
by pama 467 days ago
Can you elaborate on this point:

“ We discovered that meaningful performance improvements, as high as 10–15%, can be achieved with as few as 16 training examples.”

In particular, did you need to change the hyperparameters much, and did this limited recipe show different improvements for the larger vs smaller models? Also, how did you select these 16 examples?

1 comments

No meaningful changes to the hyperparameters, just changed the tasks per iteration to 16 and trained on the same first 16 training tasks each iteration.

We only tested this with the 14B model. You can see the run here:

https://wandb.ai/bradhilton/rl-experiments/runs/062

Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples.

Thanks.