Hacker News new | ask | show | jobs
by freediver 1119 days ago
Less scientific, but arguably more practical benchmarks here:

https://github.com/kagisearch/pyllms#model-benchmarks

1 comments

For anyone reading this, these are the actual prompts being used to assess the models.

https://github.com/kagisearch/pyllms/blob/ca9ad4d4bfdd9d58fe...