| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sonu27 520 days ago
	Title needs updating with the year 2006

1 comments

usr1106 520 days ago

I often think AI is mostly crap, wasting a lot of energy for very questionable benefits. But could/should this repetitive task of reminding submitters to follow the submission guidelines and add the year to submissions of old articles be automated?

link

pdimitar 520 days ago

I would agree, though why would you need AI for that is an open question.

link

sonu27 520 days ago

A simple crawler would have been able to detect it’s from 2006. Perhaps a reminder should be added if the year is not recent

link

Too 519 days ago

Even simpler, just check if the url or title has been submitted before. That would also take care of all the duplicate entries that pop up once per day for a week after a viral story is emerging.

In this instance, the url is slightly different from previous submissions so some more clever fuzzy matching or using only the title would be needed.

link

usr1106 519 days ago

Yes, I have always wondered why the simple duplicate checker within the same couple of days does not exist. Or does it exist and the duplicates are actually sligt variations of the URL.

link

usr1106 519 days ago

What algorithm would you suggest to find the year in an arbitrary submission? Of course AI is not a very clearly defined term, more difficult problems certainly exist. I was just thinking of the case the submission contains several dates or none at all and still several hints a human would take into consideration get checked.

Of course some minimal implementation without AI techniques could already handle many cases. My AI suggestion was not death-serious ;)

link

coder543 519 days ago

Google's research blog does not seem to provide this, but many blogs include the Open Graph metadata[0] around when the article was published or modified:

    article:published_time - datetime - When the article was first published.
    article:modified_time - datetime - When the article was last changed.

For example, I pulled up a random article on another website, and found these <meta> tags in the <head>:

    <meta property="article:published_time" content="2025-01-11T13:00:00.000Z">
    <meta property="article:modified_time" content="2025-01-11T13:00:00.000Z">

For pages that contain this metadata, it would be a cheaper/faster implementation than using an LLM, but using an LLM as a fallback could easily provide you with the publication date of this Google article.

[0]: https://ogp.me/

link

coldtea 519 days ago

>What algorithm would you suggest to find the year in an arbitrary submission?

In the submission title, a simple regex for the presence of a date with a standard format (e.g. %Y) would suffice.

Matching it to the article might or might not be possible, but that would already be enough (assuming having the date is a good thing, which I'm not certain at all)

link

pdimitar 519 days ago

As another comment suggested, you can scan for previous submissions by URL -- Algolia is very helpful with that.

Outside that, no clue, been a long time since I last wrote crawlers, admittedly. Though it can't be too difficult to crowd-source origin date parsers per domain?

But hey, if any LLM's free tier can achieve it, then why not. My point was that many people worked on that particular problem historically. It would be a shame if we can't use any of their hard work.

link

coldtea 519 days ago

I think adding the year is mostly crap. What exactly information would it give, except perhaps the false impression that this article is "antiquated information", when it pretty much holds true, and describes a perrenial issue?

link

gmfawcett 519 days ago

It gives a cue about how many times I've probably seen the article before. Quite useful, IMO. I read this particular article when it came out in 2006... it's convenient to know we're not discussing a novel finding on the same topic.

link