Right. That's commonly called the Turing test. This just pushes back the problem of defining intelligence to one of creating a proper Turing test. How do we do that?
The Turing test is actually a pretty decent and straightforward test.
I don't understand why this is an issue though. Testing intelligence was never the hard part of AI. There are so many tasks that computers currently suck at that we would be happy if they were solved, regardless what label you gave the solution. And I don't think many people could see a computer doing tasks like having conversations or solving difficult problems and deny that it is intelligence. Even if there is no formal test to perform that is 100% certain.