|
|
|
|
|
by mock-possum
108 days ago
|
|
We feed a handful of preset questions through the new AI, we collect the results, we ask another AI to score the answers based on example ‘hood’ answers we’ve written, then we have a guy sit down and use the fallout as a starting point to rank the performance of that AI, compared to all the previous ones. Seems like it works pretty well. Our prompts and params get tweaked towards better and better results, and we get a sense of what’s worth paying more for. |
|