As mentioned in the original article, being able to reuse part of a network trained on one task, on another different task that shares a subset of concepts, would indicate something like the understanding of a concept has emerged.
The only reason I am convinced it is NOT doing a good job, is how utterly difficult to apply NN to dialog generation/management domain of business, often time it behaves much worse than rule-based systems.
You'd be able to tell if every brain structure was replicated by a NN analogue (and we understood them sufficiently well). Otherwise you can only use behavioral replication (i.e. Turing tests) to infer it.