Hacker News new | ask | show | jobs
by nkaz123 166 days ago
I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser...