Hacker News new | ask | show | jobs
by justanotherjoe 796 days ago
Hmmm... What do people think in cases like this. Is the author just straight up lying?
1 comments

Well, the datasets used in the paper are all available (Appendix B) - so recreating the experiment seems possible.

What we are currently seeing in the comments are people trying random things and then saying “it doesn’t work”.

Chat examples with GPT-4: https://github.com/robertvacareanu/llm4regression/tree/main/... (the experiments used the API though)

For example, for Friedman #1, GPT-4 predicts 12.89 while the true value is 11.69 (https://chat.openai.com/share/177571ad-3845-46a1-952f-963647...)

For Original #1, GPT-4 predicts 83.63 while the true value is 80.39 (https://chat.openai.com/share/808da995-99e6-444a-94da-fc7cd5...)

Interesting - I tried completely cheating and got worse results!

https://chat.openai.com/share/6217cd86-2b0f-41b2-a36a-2558dd...

Interesting, thanks for sharing! I noticed when the LLMs were trying to explain the prediction they would sometimes erroneously generate that the relation is linear when it was not. This happened when I removed the following part of the prompt:

`The task is to provide your best estimate for "Output". Please provide that and only that, without any additional text.`

Examples of this are available in Appendix J.

Sidenote, but all the experiments were ran with the API. There are some differences between the Chat and the API, for example the Chat can generate and execute code. I shared Chats since they are easy to look at and to try.

If you have access to an API key, I made some google colabs:

Colab links:

- GPT-4 Example: https://colab.research.google.com/drive/1Bk9uBCBvzuX00Rex-t1...

- GPT-4 Small Eval: https://colab.research.google.com/drive/1_-uHvW2oLtcCXz0c-G_...

- Claude 3 Opus Example: https://colab.research.google.com/drive/105jUAGanp7ZLG-Q9Hei...

- Claude 3 Opus Small Eval: https://colab.research.google.com/drive/1-IH68TUuqf_CZyptSSr...