|
|
|
|
|
by atallahw
123 days ago
|
|
We got access to OpenAI's RFT API with GPT5 and tried to see how good we could get it at one-shot Triton kernel generation. Some key decisions/observations:
1. tool use instead of multi-turn rl
2. skip SFT altogether
3. dataset curation was more important than dataset scale
4. reward hacks detection must be robust
5. models are getting a lot better at this |
|