well, for one, it's more more than 0.1 em dashes per page. the SHARE IT Act has 10 on each page[0]. I don't know how many the 2017 tax cut bill had but it's more than 1,000 and that was over 185 pages[1], and obviously that was before LLMs like ChatGPT. so I don't really know why this is the measure of AI or not, especially because bills have always had a lot of em dashes to start. if you're not analyzing the text of the bill then it's just not going to be accurate
I'm the author and updated this post - after looking into this, the larger bills contain entire pages with only headings that contain emdashes - removed the headings from analysis so that the emdashes per page are only from the legislative text itself. For the baseline, over 50% of bills found on congress.gov are 1-2 pgs, after reading a few I decided some rationale could exist to remove them from the baseline - even after all these adjustments, we're still looking at a 30% increase from a decent baseline of similar bill size. It's evident when reading the text below headings (as a human!)
Share IT is from 2024, but the 2017 tax cut bill is interesting (lots of emdashes there that deviate from the avg) - you’re correct on the additional need for text analysis in this case. Bills I’d found from earlier in 2024 that are publicly available do not have emdashes outside of the table of contents, which is built into the average - curious how/why they are used so much in this bill from 2017, now wondering how they got into any potential templates (or not), and adds the confound of how much this is AI or template (or requirements, or something else) Thx!