Interesting, thanks for sharing! I noticed when the LLMs were trying to explain the prediction they would sometimes erroneously generate that the relation is linear when it was not. This happened when I removed the following part of the prompt:
`The task is to provide your best estimate for "Output". Please provide that and only that, without any additional text.`
Examples of this are available in Appendix J.
Sidenote, but all the experiments were ran with the API. There are some differences between the Chat and the API, for example the Chat can generate and execute code. I shared Chats since they are easy to look at and to try.
If you have access to an API key, I made some google colabs:
What we are currently seeing in the comments are people trying random things and then saying “it doesn’t work”.