|
|
|
|
|
by pama
467 days ago
|
|
Can you elaborate on this point: “ We discovered that meaningful performance improvements, as high as 10–15%, can be achieved with as few as 16 training examples.” In particular, did you need to change the hyperparameters much, and did this limited recipe show different improvements for the larger vs smaller models? Also, how did you select these 16 examples? |
|
We only tested this with the 14B model. You can see the run here:
https://wandb.ai/bradhilton/rl-experiments/runs/062
Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples.