I also found it tough to read. The image descriptions are great for people who need them, but on mobile they take up a lot of space, and present to the scanning eye as part of the post body proper.
The image descriptions did appear generated toward the beginning, but towards the middle it seems they are the prompt that was used to generate them (a bit quine-esqe) and towards the end, clearly part of the article.
Author of the article and the images here. The actual prompts were stacking about 10 networks in ways that make things look "better", and are more of the form "ligne claire, flat colors, 1girl, green hair, green eyes, long hair, <setting>, <major parts of image>" and then using a photo I took either with my iPhone or my DSLR as a img2img/ControlNet reference for things like pose and composition. The descriptions were all written by me in a very descriptive voice like what you'd find from AI generated descriptions (and in most cases, it's actually from the stage directions in script in the first place). My writing style looks like ChatGPT because they scraped my website to make up the training set for the AI.