They actually did test GPT-5: https://www.science.org/doi/10.1126/science.aec8352 (see the figure under Conclusion). Its rate of endorsement of user action, 52%, was the same as GPT-4o. So based on their setup it seems that the newer model didn't reduce affirmation.