We measured PaperQA2 (https://github.com/Future-House/paper-qa) against the science portion of the RAG-Arena benchmark (https://arxiv.org/abs/2407.13998), it's the first time we've compared PaperQA2 against other systems based on Cohere or Contextual.ai. PaperQA2 achieves a 12.4% higher score than Contextual.ai on the same dataset (1,404 questions and 1.7M documents).