Remember that diff is an algorithm to generate the smallest set of operations to produce version B from version A, not an accurate reconstruction of what happened.
Diff algorithms are also often tuned not try as hard to find the smallest set of changes for larger documents, due to speed concerns.
Git's built-in diff algorithm is particularly bad for text. Since it's aimed at line-oriented code, it does line-based diffs, which is horrible for ASCII text that is reflowed, because every line in a paragraph will show up as changed for a small change.
What's happened is that "Section 2" was moved later in the sentence, abbreviated as "Sec. 2", "of" was deleted, and "act" was capitalized:
Section 2 of act July 30, 1947, ch. 392, 61 Stat. 674, provided...
Act July 30, 1947, ch. 392, Sec. 2, 61 Stat. 674, provided...
The rest of the paragraph is unchanged, but git shows a 6-line diff with the entire paragraph replaced. GitHub attempts to do some word-based highlighting (see the timestamp lines), but it falls down on most of these paragraphs. Wikipedia's diffing tends to work better for this kind of thing; I'm not sure what they use. The upshot is that the number of lines changed may be a 5-10x overestimate.
It may be doing that conversion, but the conversion works. For example, committing the following text (with line breaks), then joining it all into one line, shows no differences when using 'git diff --word-diff'.
Test the first. This will check if reflowing
text actually produces git word-diff weirdness,
or if it's actually decent.
The line does get reproduced on the terminal (a line diff was seen), but no text is shown in green or red to indicate an actual change.
Just checked with the old and new versions of Title_09.txt and you're right, --word-diff does the right thing. It echoes all the changed lines to the terminal, but it only marks up (and colors, in color mode) the changed words:
[-Section 2 of act-]{+Act+} July 30, 1947, ch. 392, {+Sec. 2,+} 61 Stat. 674, provided that
Wonder if there's a way to enable that behavior on GitHub? And/or to generate repository activity statistics based on changed words rather than changed lines?
Sure, but this particular test tells you nothing.
You need one where the "line diff" has identified the wrong set of changes (IE it has decided two sets of text look close enough that one is really a change into the other, even though that's not what historically happened).
I did some work & research about diffs when I tried to visualise progression of slovak law. My best attempt was a diff method that would understand the inner structure of the law. I ended up with simple draft but I am sure somebody more competent could look into that.
https://github.com/divegeek/uscode/graphs/code-frequency