Hacker News new | ask | show | jobs
Show HN: Days Since Last Elon (dayssincelastelon.com)
26 points by landric 1212 days ago
A toy project I created to track the appearance of the text "Elon" on the front pages of various news sites.
4 comments

This is funny. I'm curious how exactly you count an "elon". E.g. Google news shows no Elons, but that is for sure 100% wrong.

Also maybe you shouldn't be counting news aggregators like Google News? Its basically double counting since its already on some other site.

I _had_ no good answer for the Google News result until you prompted me to Inspect source just now...

I'm basically scanning for <a> tags and searching the text within. Doing a Google News inspect, it appears that their links actually have no text, but are sibling elements of an <h#> tag. So, I need to figure out how to parse that correctly...

> Doing a Google News inspect, it appears that their links actually have no text, but are sibling elements of an <h#> tag. So, I need to figure out how to parse that correctly...

I just checked Google News myself, and you are correct that the sibling <h#> tag has the text. However, the <a> tag with the link has it too, but as a prop instead of being nested inside. Unless I am mistaken about the use case of that prop here, you can just extract the text from the aria-label property of the <a> tag.

And in case you want to proceed with parsing text from the sibling <h#> tag instead, you can just get the list of the parent <article> tag children nodes (yourAnchorTagNode.parentNode.parentNode.children; had to do a double .parentNode, because the <a> tag is wrapped in a singular <div> tag) and then search for the only <h#> tag there. That will be your target tag with the text.

Yep, that's right.

I was _hoping_ to get away with the same xml-parsing for each site, but I guess I'll need to customize

Practically speaking, you might actually sorta get away with it by using a single if-check, as long as you go with the aria-label approach instead of the <h#> sibling node search.

My logic is that it is very unlikely that another website will copy over the exact html layout of Google News, so the <h#> is only going to work there. But I bet that Google News is far from the only website that has the article title text inside the aria-label prop in the <a> tag.

So you can cover a heavy majority of websites you care about (if not all of them) by just checking both the inner text and (in case the inner text is absent) the aria-label prop. No need for any custom logic implemented just for Google News, as it would likely solve this issue for a lot of other sources.

If you include your own website, you can simplify the algorithm to "0".
Stack was hugged to death... anyone has a screenshot or description?
Calendars for a number of news sites and aggregators, showing, per their methodology, of when "elon" appeared on their front page. I have to say that a couple of their results seem suspect to me so I question the methodology
OP here. Happy to share any details as I deal with the hug-of-death...

What seemed off to you?

For one this article is now on the main page of hacker news and it reports four days since last Elon on hacker news :D Might just be that it hasn’t been on the front page long?
And Twitter itself.

Can't open Twitter without one of Elon's tweets on top of it.