|
|
|
|
|
by roampal
177 days ago
|
|
Ran a 4-way comparison test across 200 query-memory pairs: - Baseline RAG (embedding similarity only): 10% - RAG + reranker: 20% - Outcomes only (no reranker): 60% - RAG + outcome scoring (mature memories with 20+ uses): 60% "Accuracy" = correct memory ranked #1 for the query. The outcome scoring uses Wilson score lower bound - memories that consistently get positive feedback from the "user" get boosted, ones that fail get demoted. Test methodology: https://github.com/roampal-ai/roampal/blob/main/dev/benchmar... |
|