The linked tweet has a diagram where you can pretty quickly see that this isn't just about using wikidata as a training set. The paper linked from the tweet also gives a good summary on its first page.
Nope. Training data for the big LLMs is a corpus of text, not structured data. There would be much more dimensionality with regard to parameterization as far as I understand when it comes to structured data