| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roampal 177 days ago

Ran a 4-way comparison test across 200 query-memory pairs:

- Baseline RAG (embedding similarity only): 10%

- RAG + reranker: 20%

- Outcomes only (no reranker): 60%

- RAG + outcome scoring (mature memories with 20+ uses): 60%

"Accuracy" = correct memory ranked #1 for the query. The outcome scoring uses Wilson score lower bound - memories that consistently get positive feedback from the "user" get boosted, ones that fail get demoted.

Test methodology: https://github.com/roampal-ai/roampal/blob/main/dev/benchmar...