|
|
|
|
|
by alsima
684 days ago
|
|
Looking at Table 3: "Our [sampling and voting] method outperforms other methods used standalone in most cases and always enhances other methods across various tasks and LLMs", which benchmark did the majority vote algorithm perform worse in? |
|
Fundamentally, drawing multiple independent samples from an LLM is not "AI Agents". The only rows on those tables that are AI Agents are Debate and Reflection. They provide marginal improvements on only a few task/model combinations, and do so at a hefty increase in computational cost. In many tasks, they are significantly behind simpler and cheaper alternatives.