Y
Hacker News
new
|
ask
|
show
|
jobs
Bitwise Consistent On-Policy Reinforcement Learning with VLLM and TorchTitan
(
blog.vllm.ai
)
1 points
by
brrrrrm
226 days ago