Generate as few tokens as possible, GPT4 is running a few times to generate a single answer and latency quickly becomes the biggest UX issue.
We abandoned most of the common thinking around chain of thought reasoning, finding it didn’t help accuracy much whilst increasing response times significantly.
Exactly, you can see the prompt in this file [0]. I'm not sure how LangChain arrived at their default agent prompt, but you'll almost certainly want to write your own for performance reasons if you put something into production.
This is great that you got gpt-4 to explore the codebase using an agent approach. I tried this previously with gpt-3.5-turbo and have been meaning to revisit it since I got gpt-4 access.
I shared some notes on HN awhile back on a variety of experiments I did with gpt-3.5-turbo.
We abandoned most of the common thinking around chain of thought reasoning, finding it didn’t help accuracy much whilst increasing response times significantly.
Full write up to follow in next week or so.