|
|
|
|
|
by DannyBee
4701 days ago
|
|
Remember that diff is an algorithm to generate the smallest set of operations to produce version B from version A, not an accurate reconstruction of what happened.
Diff algorithms are also often tuned not try as hard to find the smallest set of changes for larger documents, due to speed concerns. |
|
Example: https://github.com/divegeek/uscode/commit/1fb2d83137dad1c6ca...
What's happened is that "Section 2" was moved later in the sentence, abbreviated as "Sec. 2", "of" was deleted, and "act" was capitalized:
The rest of the paragraph is unchanged, but git shows a 6-line diff with the entire paragraph replaced. GitHub attempts to do some word-based highlighting (see the timestamp lines), but it falls down on most of these paragraphs. Wikipedia's diffing tends to work better for this kind of thing; I'm not sure what they use. The upshot is that the number of lines changed may be a 5-10x overestimate.