Hacker News new | ask | show | jobs
by smhx 1955 days ago
I've been helping on the project, it's lead by Chris Cummins and Hugh Leather.

Just a heads-up for folks, we haven't fully cleaned up and gotten ready for public attention yet, we are 90% there.

Once we are golden, we're going to write a note with a way to submit the results of your own agents, compare with baselines (random, actor-critic), etc.

And as others noticed, in it's current form we're focusing on code size and phase ordering, but we will be expanding over time to other optimization problems like runtime.

1 comments

This is really cool - I understand how the reinforcement loop works for improving performance, but how does it verify that the optimizations applied don't change the semantics/correctness of the code?
Regular old tests, I imagine
This. For now we rely on differential testing against a gold-standard implementation (e.g. unoptimized). For the action space we expose, any semantics-breaking change induced by our tool is a compiler bug.