The disk is unable to extract the correlations, nor is it able to apply the knowledge; it transparently stores the data verbatim. The model doesn't store the training set, it extracts the complex correlations from it, and is able to make actual predictions based on the knowledge it extracted.
But yeah, the "knowledge" and "understanding" are hard to define formally, so this discussion can be endless. Common well-defined terms are required.
the model does not extract knowledge. an external algorithm trains the models parameters and then the model is fed a string that is also evaluated externally based on the models configuration.
Semantics. You could say the same about the disk - the data doesn't get magically teleported from the magnetic plates to the RAM, it needs a lot of underlying hardware to read and transfer it.
Model is not just a set of weights, it's inseparable from the underlying architecture, the way to train and to apply them in practice.
But yeah, the "knowledge" and "understanding" are hard to define formally, so this discussion can be endless. Common well-defined terms are required.