Pretty sure most of the time people only read the first few and last words and e.g. don't concern if the words in the middle are in correct order. Not sure about the size of that dictionary, but it would seem to greatly diminish the entropy.
Whether it is vision or words, the point is to enlist some of our primal, automatic brain machinery. Both the random word lists and randomart are a good start, but far from perfect.
The word lists ignore and even foil, our gramatical machinery. And I at least have never been able to remember what my own randomart SSH key fingerprint looked like. Adding colour might be a good start.
It has to be a visualization in which changing a few pixels make it look significantly different. Otherwise we can still make a 'low distance' brute force attack.
What you need is a picture that makes the visually salient information tot up to about 160 bits.
That's tough, but since the human visual system is so powerful, it's not hopeless. But we would need real psychologists to help design the art generators, backing the results with experiments.
It depends on how valuable the identity is.
I check some characters, for additional security I check some in the middle until I am satisfied with security. The downside is security creep, but verified identities generally grow more secure the older they are (does this grow faster?).
The upside to showing a larger hash is that humans are very good at roughly comparing two things. The difference in casing is probably enough to trigger a conscious check. A visual hash is still better.
We still need to define a hash format. Typical hex/base64 would work, but imagine someone tries to be smart and invent a dictionary word encoding with Unicode characters, and then someone brute forces another key that's actually different but will match a search with smart Unicode collation algorithms.