|
|
|
|
|
by nkaz123
166 days ago
|
|
I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser... |
|