|
|
|
|
|
by bradhilton
470 days ago
|
|
No meaningful changes to the hyperparameters, just changed the tasks per iteration to 16 and trained on the same first 16 training tasks each iteration. We only tested this with the 14B model. You can see the run here: https://wandb.ai/bradhilton/rl-experiments/runs/062 Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples. |
|