Hacker News new | ask | show | jobs
Bitwise Consistent On-Policy Reinforcement Learning with VLLM and TorchTitan (blog.vllm.ai)
1 points by brrrrrm 226 days ago