This answer really isn’t good enough. The providers can’t both aim to replace search and claim PhD level intelligence that will do all the jobs, but hide behind “it makes mistakes” in small print.
I think it's the fluency. Other tools fail visibly. A bad search result looks like a bad search result. A hallucinated quote reads exactly like a real one. There's no signal in the output itself that something is wrong. You have to go back to the source to check, and the whole point of using the tool was to not have to do that.