|
|
|
|
|
by ainch
4 days ago
|
|
I'd spent 6 hours solving a gnarly RL problem (mathematically solving divergence of off-policy TD-Lambda for any value of lambda or behaviour policy). As a punt I gave it to o3 (remember LLMs were 'bad at maths') - after 15 minutes it returned with the answer that had taken me hours. |
|