Hacker News new | ask | show | jobs
by dr_kiszonka 534 days ago
I wish you posted more evaluation details on your page as text. What exactly was your accuracy vs. Sonnet? (Right now, we can only tell that Sonnet's was ≤ 1/4.3.) Why the Discourse repo? Providing more detailed information would help folks trust your claims more.
1 comments

I agree, we need to post more data. Since we are very early (<1 month) we just shared the initial results. Discourse repo was just a good option since it is a big public repo that could benefit from fine-tuning. We plan to add more benchmarks to the website as we progress.