| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dsteinweg 5231 days ago
	It looks like it's pulling characters from the paragraph to generate the "unique" paragraph ID. ID = First letter from the first 3 words in the first sentence in the paragraph + First letter from the first 3 words in the last sentence in the paragraph. I wonder... for all the different articles on NYTimes, and the different configurations of words across paragraphs, is this unique enough such that you won't get duplicate paragraph IDs in any given article?

2 comments

WiseWeasel 5230 days ago

It only has to be unique within the article, since it's added to the article path, and there would likely be some kind of provision to add or swap out for a unique character in case of conflict. It's also case-preserving, so that implies likely case-sensitivity as well. I guess we'll have to find an instance of two - probably single-sentence - paragraphs with the same characters and same capitalization in the same story to be certain.

Not it!

link

celoyd 5230 days ago

Especially because it works in exactly the way you specify even when there’s only one sentence in a paragraph. So the paragraph:

That was too much for the water district’s attorney.

And:

They were torn apart by angry ducks.

Will both hash to “TwtTwt”. One-sentence paragraphs are probably deprecated in the NYT’s style guide anyway, but I imagine it might still come up.

link

donohoe 5230 days ago

One sentence paragraphs still happen but it still works :)

link