Hacker News new | ask | show | jobs
by maister 1144 days ago
This paper highlights a crucial aspect of evaluating AI language models: the significance of prompt construction (e.g. adding "think step by step").

When a model is given insufficient context beyond the question, it may generate responses based on its best guess. This situation can be compared to abruptly waking someone up in the middle of the night and demanding an immediate response to a question.

In contrast, when humans are asked to answer questions in a test setting, they are aware of the larger context and the importance of providing accurate answers.