Hacker News new | ask | show | jobs
user: shahules
created: 2021-10-06
karma: 131

submissions:

0 points | 0 comments
Cloning Bench: Evaluating AI Agents on Visual Website Cloning
2 points | 1 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
PA bench: Evaluating web agents on real world personal assistant workflows
38 points | 9 comments
0 points | 0 comments
PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks
7 points | 1 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Show HN: Ragas – Open-source library for evaluating RAG pipelines
121 points | 26 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Show HN: Ragas – Open-source library for evals and testing RAG systems
15 points | 9 comments
0 points | 0 comments
0 points | 0 comments