|
|
|
|
|
by resiros
442 days ago
|
|
Congrats on the launch! The idea sounds very interesting on paper. The tricky part though is the reward function. Providing finetuning as a service works because the friction with finetuning is operational (getting the GPUs, preparing the training...), so the vendor can take care of that and give you an API. The work becomes straightforward and doesn't require much preparation - give us some examples and we'll provide you a model that works well with these and hopefully generalizes. RL as a service is much trickier in my opinion. The friction is not only operational. Getting RL to work (at least from my probably deprecated 10-year-old knowledge) is much harder because the real friction is in building the right reward function. I've skimmed your docs, and you don't say much about reward functions other than the obvious. I think to get this to work, you need to improve your docs and examples a lot, and maybe focus on some recurrent use cases (e.g., customer support agent) with clear reward functions. Perhaps provide some building block reward functions and some UI/tools to help create them. Basically, find a way to remove the real friction on how to use RL in my agent - the reward function part. In any case, congrats again on the launch. We're building an LLMOps platform (see my profile), there might be collaboration/integration potential, write me if you think that's interesting. |
|