|
|
|
|
|
by hannesfur
442 days ago
|
|
Yes, you could do that.
However, you would have created a different platform than Augento. Maybe we should make the distinction clearer though. The blog article you are referring to uses another method to fine-tune models that many other big platforms like Together AI (and even OpenAI themselves) are already supporting: Supervised Fine Tuning (SFT). We are doing Reinforcement Learning using GRPO instead.
SFT has the big caveat that it requires good prompt-completion datasets to work, which are rare/hard to curate for many use cases. For GRPO, you (the programmer) don’t even need to know what the correct answer is as long as you can decide if it’s a good answer (P?NP) at its heart, essentially. |
|