| This is one of my favorite test cases: Prompt:
Here is a riddle. It is a common riddle but with some changes that make it more difficult. You are an alien that will live for at least 10000 years. You have no sense of temperature. you can not feel hot or cold. you have eyes and can see. you are in a house. downstairs are 3 light switches that control 3 light bulbs that are upstairs. Each light bulb will last for exactly 3 years. you can tell by looking at a bulb if it has burned out after 3 years. devise a strategy to determine which switch controls which light.
ChatGPT 4
The standard solution to this problem involves using the heat generated by the bulbs, but since you as an alien can't feel temperature, we need to utilize the fact that these bulbs have a specific lifespan of 3 years. Here's a strategy: Turn on the first switch and wait for 3 years. After 3 years, turn it off. Right after turning off the first switch, turn on the second switch. Leave the third switch off throughout this process. Now, go upstairs to inspect the bulbs: The bulb that is off and has burned out is controlled by the first switch. This bulb was turned on for 3 years, so it has reached the end of its lifespan and burned out. The bulb that is on is controlled by the second switch. You turned this on right after turning off the first switch, so it's still functioning. The bulb that is off and hasn't burned out is controlled by the third switch. This switch was never turned on, so the bulb is still in a brand-new condition. This strategy ensures that you can confidently identify which switch controls which bulb. |
Using GTP4 I asked if there was a way to do it in less than 3 years, but it couldn't figure this out even if I told it you can look and use the switches as much as you want. Instead it suggested turning on a switch for 10 minutes, then using your "excellent alien vision" determine which 3 year lifespan bulb has 10 minutes of wear on it.
Makes me think GPT4 doesn't really have better reasoning, it just looks like better reasoning because it's been fed way more data.