| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by justanotherjoe 796 days ago
	Hmmm... What do people think in cases like this. Is the author just straight up lying?

1 comments

iamflimflam1 796 days ago

Well, the datasets used in the paper are all available (Appendix B) - so recreating the experiment seems possible.

What we are currently seeing in the comments are people trying random things and then saying “it doesn’t work”.

link

rvacareanu 796 days ago

Chat examples with GPT-4: https://github.com/robertvacareanu/llm4regression/tree/main/... (the experiments used the API though)

For example, for Friedman #1, GPT-4 predicts 12.89 while the true value is 11.69 (https://chat.openai.com/share/177571ad-3845-46a1-952f-963647...)

For Original #1, GPT-4 predicts 83.63 while the true value is 80.39 (https://chat.openai.com/share/808da995-99e6-444a-94da-fc7cd5...)

link

iamflimflam1 796 days ago

Interesting - I tried completely cheating and got worse results!

https://chat.openai.com/share/6217cd86-2b0f-41b2-a36a-2558dd...

link

rvacareanu 794 days ago

Interesting, thanks for sharing! I noticed when the LLMs were trying to explain the prediction they would sometimes erroneously generate that the relation is linear when it was not. This happened when I removed the following part of the prompt:

`The task is to provide your best estimate for "Output". Please provide that and only that, without any additional text.`

Examples of this are available in Appendix J.

Sidenote, but all the experiments were ran with the API. There are some differences between the Chat and the API, for example the Chat can generate and execute code. I shared Chats since they are easy to look at and to try.

If you have access to an API key, I made some google colabs:

Colab links:

- GPT-4 Example: https://colab.research.google.com/drive/1Bk9uBCBvzuX00Rex-t1...

- GPT-4 Small Eval: https://colab.research.google.com/drive/1_-uHvW2oLtcCXz0c-G_...

- Claude 3 Opus Example: https://colab.research.google.com/drive/105jUAGanp7ZLG-Q9Hei...

- Claude 3 Opus Small Eval: https://colab.research.google.com/drive/1-IH68TUuqf_CZyptSSr...

link