| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pnathan 4748 days ago
	I'm really tempted to collect the XML files and put them on github, with periodic checkpoints to update it with the latest. Watching the evolution of law over time is a fascinating thing and using SW engineering tools to help would be really fun.

11 comments

sc68cal 4748 days ago

This is currently done via scraping:

https://github.com/divegeek/uscode

The diffs are huge.

link

shawkinaw 4748 days ago

Check out the Obamacare blip:

https://github.com/divegeek/uscode/graphs/code-frequency

link

DannyBee 4748 days ago

Remember that diff is an algorithm to generate the smallest set of operations to produce version B from version A, not an accurate reconstruction of what happened. Diff algorithms are also often tuned not try as hard to find the smallest set of changes for larger documents, due to speed concerns.

link

mjn 4748 days ago

Git's built-in diff algorithm is particularly bad for text. Since it's aimed at line-oriented code, it does line-based diffs, which is horrible for ASCII text that is reflowed, because every line in a paragraph will show up as changed for a small change.

Example: https://github.com/divegeek/uscode/commit/1fb2d83137dad1c6ca...

What's happened is that "Section 2" was moved later in the sentence, abbreviated as "Sec. 2", "of" was deleted, and "act" was capitalized:

    Section 2 of act July 30, 1947, ch. 392, 61 Stat. 674, provided...

    Act July 30, 1947, ch. 392, Sec. 2, 61 Stat. 674, provided...

The rest of the paragraph is unchanged, but git shows a 6-line diff with the entire paragraph replaced. GitHub attempts to do some word-based highlighting (see the timestamp lines), but it falls down on most of these paragraphs. Wikipedia's diffing tends to work better for this kind of thing; I'm not sure what they use. The upshot is that the number of lines changed may be a 5-10x overestimate.

link

gmartres 4748 days ago

> Since it's aimed at line-oriented code, it does line-based diffs

You can do word diffs with git:

    git diff --word-diff=color

link

DannyBee 4748 days ago

It's still recreating a word based diff from a line based diff.

See diff.c line 793 for how it works.

link

azernik 4748 days ago

It may be doing that conversion, but the conversion works. For example, committing the following text (with line breaks), then joining it all into one line, shows no differences when using 'git diff --word-diff'.

  Test the first. This will check if reflowing
  text actually produces git word-diff weirdness,
  or if it's actually decent.

The line does get reproduced on the terminal (a line diff was seen), but no text is shown in green or red to indicate an actual change.

link

stock_toaster 4748 days ago

If there are sporadic line differences, for that git diff supports different algorithms than the default. patience or histogram may work better.

as far as words in lines, you do have a point.

link

bliker 4748 days ago

I did some work & research about diffs when I tried to visualise progression of slovak law. My best attempt was a diff method that would understand the inner structure of the law. I ended up with simple draft but I am sure somebody more competent could look into that.

link

DannyBee 4748 days ago

At least in the US, a lot of the laws that get passed are in the form of diffs.

That is, the law that they enact says "This law is to do blah blah blah.

Subsection 1373(a) of the US code is replaced with the following text 'blah blah blah'"

The wording used is pretty standard. So you can actually parse it in most cases to see what the actual changes are.

link

saraid216 4748 days ago

> I ended up with simple draft but I am sure somebody more competent could look into that.

If nothing prevents you, you ought to throw this up for others to see. Worst comes to worst, no one finds it useful.

link

vog 4748 days ago

In Germany we solved this problem by converting the XML into readable markdown first:

https://github.com/bundestag/gesetze/commit/f90e8fc8eb20f081...

link

saraid216 4748 days ago

Back in college, I had a project where I walked through every line of the Patriot Act and noted exactly which paragraphs were modified.

I'd be surprised if we couldn't write a parser for bills to produce more efficient diffs.

link

maaku 4748 days ago

It would be even awesome-er (and more useful) if you could parse individual bills and amendments into diffs, which get merged into 'master' as they become law.

I'd love to `git blame` the U.S. code.

link

rmc 4748 days ago

But that only works at the shallow level. A crook can get around that by asking/bribing/convincing someone else to be the one who's responsible for the amenment.

link

liscovich 4748 days ago

There is very little outright corruption in Congress. Special interests exert most of the influence through campaign contributions that are publicly disclosed. Larry Lessig has a great book on this: http://www.amazon.com/Republic-Lost-Corrupts-Congress---eboo.... And here is the link to his TED talk on the same topic: http://www.youtube.com/watch?v=mw2z9lV3W1g

link

minor_nitwit 4748 days ago

I wonder if Congress uses any sort of version control. The text of these bills are written by - staffers and sometimes lobbyists, so I'm not sure how it would work.

link

tlrobinson 4748 days ago

I was under the impression law is essentially "append only". New laws override existing laws, but the text of the existing law never changes.

link

rayiner 4748 days ago

Laws are essentially diffs against the US code. The diff (slip law) is canonical. They are continually compiled into the US code, which can involve deleting or changing text just like a diff, and periodically an edited, annotated code is published. After a certain amount of time, Congress enacts a portion of the published code, making it canonical and overriding any prior slip law.

See: http://en.wikipedia.org/wiki/United_States_Code#Legal_status

link

DennisP 4748 days ago

What we need now is software that reads bills ("in section 123.abc the text 'blah blah' is replaced by 'bleh bleh') and compiles it into before/after views of what the resulting code would be.

link

saraid216 4748 days ago

I just suggested this, and then scrolled down to find this.

Someone needs to take the plunge and start writing the program; throw it on Github and tell us all about it. I know people who are looking for such a tool.

link

shawkinaw 4748 days ago

Interesting: http://uscode.house.gov/codification/legislation.shtml

link

lmkg 4748 days ago

The US Code is not the same as the laws of the US. It is a "current snapshot" of existing laws in force, and does not itself have legal weight unless explicitly granted by Congress.

link

gamblor956 4748 days ago

See, e.g., U.S. National Bank of Oregon v. Independent Insurance Agents of America, Inc., 508 U.S. 439, 440 (1993) for the Supreme Court's ruling and underlying logic.

link

liscovich 4748 days ago

According to Wikipedia:

"When sections are repealed, their text is deleted and replaced by a note summarizing what used to be there." https://en.wikipedia.org/wiki/United_States_Code#Treatment_o...

link

at-fates-hands 4748 days ago

Imagine trying to keep track of this in paper form instead of digitally.

link

officemonkey 4748 days ago

Imagine? I used to do it. When I first started out, I would get stacks of the Chicago Municipal Code revisions on onionskin and it was my job to follow the instructions to update the five-inch binder.

"Remove pages 123.4 - 123.6 and replace with pages 123.4a-123.7."

Later, when I learned about diffs, I understood the concept immediately.

link

michaelolenick 4748 days ago

They've been doing exactly that for about 237 years.

link

TsiCClawOfLight 4748 days ago

We actually have something similar in germany, called the "Bundesgit" (https://github.com/bundestag/gesetze)

link

eblume 4748 days ago

I'm not a native German speaker but that's a pretty clever pun, right?

link

ygra 4748 days ago

Not that much of a pun, really. Just a funny-sounding portmanteau. That being said, official names of things by the government usually sound very ridiculous, so this is definitely much saner.

link

TsiCClawOfLight 4747 days ago

'Bundes-' means federal. It's not much of a pun, but I like it :)

link

wavefunction 4748 days ago

"Diffing the law" between different date points would be amazing. I hope you follow your temptation in this regard.

I'd do it myself but I'm already neck deep in work and volunteer projects outside of work.

link

smackfu 4748 days ago

The funny thing is that a lot of law is structured like definitions, the actual law, then consequences. Any of those three can change independently, and change the meaning of the law. So diffs are often not as useful as you wish.

link

lifeformed 4748 days ago

We need to parse the law into some simulation code, and then have unit tests (does scenario B cause citizen A's rights to be violated), and then check if changes break the tests.

link

NoodleIncident 4748 days ago

Do it. Unit test the code. You might even be able to get law students to do it for free as a study aid.

link

varikin 4748 days ago

I just created a repo with all the codes[1]. There was a convenient link to all codes in one zip file. I will note that this is a massive amount of text.

There is a link to the schema used and a stylesheet (I assume for the xhmtl maybe?) that I would like to add in. But one step at a time.

[1] https://github.com/varikin/UnitedStatesCode/

link

anigbrowl 4748 days ago

Ah... http://uscodebeta.house.gov/download/priorreleasepoints.htm

link

redblacktree 4748 days ago

Please do! This is the kind of thing that hatches in my mind as a great idea, but withers due to a total lack of follow-up.

link

liscovich 4748 days ago

It would also be fascinating to see the visualizations of this evolution. And perhaps by matching it with the voting data from congress, one could track the footprint of each congressman.

link

TsiCClawOfLight 4748 days ago

see my comment above, you might be interested :)

link

saraid216 4748 days ago

Probably more important than just diffs is actually a dependency graph and a topical index. While they've tried to do this via titles/chapters and references, linear breaks are never going to be as successful as many-to-many linkages as every Bible concordance out there demonstrates.

link

Symmetry 4748 days ago

I'd really like to see a graph of the number of bytes over time, and what sort of curve it fits.

link

markdown 4748 days ago

NZ laws on github: https://github.com/Br3nda/legislation/tree/master/act/public

link

mkehrt 4748 days ago

Do they accept pull requests?

link