Y
Hacker News
new
|
ask
|
show
|
jobs
user:
shahules
created:
2021-10-06
karma:
131
submissions:
0 points
|
0 comments
Cloning Bench: Evaluating AI Agents on Visual Website Cloning
2 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
PA bench: Evaluating web agents on real world personal assistant workflows
38 points
|
9 comments
0 points
|
0 comments
PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks
7 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Show HN: Ragas – Open-source library for evaluating RAG pipelines
121 points
|
26 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Show HN: Ragas – Open-source library for evals and testing RAG systems
15 points
|
9 comments
0 points
|
0 comments
0 points
|
0 comments