Hacker News new | ask | show | jobs
user: kumama
created: 2017-02-19
karma: 5

submissions:

I post-trained a model to reliably roll a die
2 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Open-Weight Models Don't Need to Win
5 points | 8 comments
Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads
4 points | 0 comments
Pokegents: Making multi-agent coding feel like a team
8 points | 1 comments
Grpo explained: group relative policy optimization for LLM finetuning
1 points | 0 comments
Do RL on a model with your vector db
1 points | 0 comments
What is reinforcement learning finetuning
3 points | 0 comments
RAG to riches: synthetic data for training RAG agents
2 points | 0 comments
rag not lag: rl for fast agentic retrieval
3 points | 0 comments
Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning
1 points | 0 comments
Beating o3/o4-mini with Codebase-specific Reinforcement Learning
3 points | 0 comments
0 points | 0 comments
We might be overestimating coding agent performance on SWE-Bench
1 points | 1 comments
0 points | 0 comments
How to Improve Code Completion LLMs with Repo-Specific Finetuning
3 points | 1 comments
Show HN: Free AI Code Completion for Xcode with model choice/codebase context
2 points | 0 comments
0 points | 0 comments