No, diversity isn't creativity. For example, we could search google for "great art" and if it produced a sample of one art work from ever decade of the last 500 years that would likely be highly diverse in style and content. If it returned a list of the best work from western Europe in the of the 18th century it would be rather consistent. Both lists would have the same amount of creativity though - 0.
"one art work from every decade of the last 500 years that would likely be highly diverse in style and content"
It still might not be especially diverse if all 50 examples were from western European art. 500 years only takes us back to 1524 - not especially long and mostly from the same early modern period starting with the fall of Constantinople, the end of the Crusades, and the start of the Renaissance. I wouldn't be surprised if 80% or more of the works ended up being some depiction of aspects of Christianity painted by a white male.
It must be, ultimately, because all art is individual or group expression, and each person can only belong to so many groups. But the individual expression still allows for a giant amount of expressiveness, and the group expression is wider than race or sex.
I only skimmed the paper but this was my concern as well: if I understand correctly the author is measuring "creativity" in terms of syntactic and semantic diversity, which I guess could be a starting point, but if my model was just white noise would that make it infinitely creative? Did I miss anything?
Also, I have tried the first llama base model and while it was fun to interact with, I'm not sure how useful an "uncensored" (as some people likes to call it) LLM is for practical work. I think you could obtain better results using 4chan as a mechanical Turk service honestly.