| HN Mirror

Unfortunately, right now the LLM cost is just a fundamental issue. I think it is hard to get around because comparing answer quality usually involves understanding the question and answer itself which is a task that's really well suited to LLMs.

One thing we have considered is some forms of evaluation could be replaced simply with using the embeddings of the question, context, and answer instead of using the LLM model for analysis. The idea is you could compare all the embeddings to get a rough idea of the performance based on similarity. That should in theory reduce costs. The only other alternative is just to use less advanced models which are cheaper.