Yeah... So looks like at least it's still an open question. I guess until we can definitively know how "knowledge" is collectively represented among the weights, it's hard to say either way. The other part of the question is how to evaluate the existence of "knowledge" in an LLM. TFA suggests a way, but still not 100% convinced that's THE way...
Nexusflow probably too, as it also does function calling and would need to bake in, or explicit fine-tuning for RAG use, which I don't recall seeing
I haven't look recently, but there is also a cool category of models that provide GIS inferencing via LLM