User: kumama | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

user: kumama
created: 2017-02-19
karma: 5

submissions:

I post-trained a model to reliably roll a die

2 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

Open-Weight Models Don't Need to Win

5 points | 8 comments

Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads

4 points | 0 comments

Pokegents: Making multi-agent coding feel like a team

8 points | 1 comments

Grpo explained: group relative policy optimization for LLM finetuning

1 points | 0 comments

Do RL on a model with your vector db

1 points | 0 comments

What is reinforcement learning finetuning

3 points | 0 comments

RAG to riches: synthetic data for training RAG agents

2 points | 0 comments

rag not lag: rl for fast agentic retrieval

3 points | 0 comments

Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning

1 points | 0 comments

Beating o3/o4-mini with Codebase-specific Reinforcement Learning

3 points | 0 comments

0 points | 0 comments

We might be overestimating coding agent performance on SWE-Bench

1 points | 1 comments

0 points | 0 comments

How to Improve Code Completion LLMs with Repo-Specific Finetuning

3 points | 1 comments

Show HN: Free AI Code Completion for Xcode with model choice/codebase context

2 points | 0 comments

0 points | 0 comments