| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hannesfur 442 days ago
	Yes, you could do that. However, you would have created a different platform than Augento. Maybe we should make the distinction clearer though. The blog article you are referring to uses another method to fine-tune models that many other big platforms like Together AI (and even OpenAI themselves) are already supporting: Supervised Fine Tuning (SFT). We are doing Reinforcement Learning using GRPO instead. SFT has the big caveat that it requires good prompt-completion datasets to work, which are rare/hard to curate for many use cases. For GRPO, you (the programmer) don’t even need to know what the correct answer is as long as you can decide if it’s a good answer (P?NP) at its heart, essentially.