Hacker News new | ask | show | jobs
by minimaxir 1478 days ago
A reminder that all Hacker News posts and comments are available on BigQuery and can be queried for free: https://console.cloud.google.com/marketplace/details/y-combi... (the `full` table is up-to-date; ignore the others)

Here's a query for a rough reproduction of what's asked in the title:

    WITH whoishiring_threads AS (
      SELECT id FROM `bigquery-public-data.hacker_news.full`
      WHERE `by` = "whoishiring" 
      AND REGEXP_CONTAINS(title, "Ask HN: Who is hiring?")
    )

    SELECT FORMAT_TIMESTAMP("%Y-%m", `timestamp`) as year_month,
    COUNT(*) as num_toplevel_posts
    FROM `bigquery-public-data.hacker_news.full`
    WHERE parent IN (SELECT id FROM whoishiring_threads)
    GROUP BY 1
    ORDER BY 1
Which results in something like this: https://docs.google.com/spreadsheets/d/13yGlJzFpVzZ-WNHAOsdo...

Still a bit of room to clean up the query, though, and there are some differences from the chart in the post.

6 comments

I just want to say, in a comment that furthers the discussion in no way, that having something like this available for free is just incredible.
I suppose that massive spike is the beginning of the pandemic.

It's interesting to see March 2020 (the highest month of all time) and April 2020 (the lowest month since early 2016) back to back.

Hey this is nice! I created a little app here that's quite similar but can be run without knowing SQL: https://slight.run/apps/colman/hacker_news_monthly_top_level..., and a graph to go with it: https://slight.run/graphs/colman/hacker_news_who_is_hiring_c...

One thing that's interesting: this is a sneaky case where the line hides two missing months in 2015.

Google is infamous for breaking backward compatibility, and sunsetting things, so I will not advice anyone to use this to build any products. However it is fine for one-off tasks, like fetching data for this blog post.
Thanks for formatting the timestamp, which I missed in the original post.
> all Hacker News posts and comments are available on BigQuery and can be queried for free

Good thing there aren’t any laws that regulate copyright or privacy, so datasets like this can be published without asking users for consent.

Anybody could crawl HN and build up the exact same dataset. You choose to publish your words for the world to consume when you submit a comment.
It’s not less illegal because anyone could do it. I have copyright to my comments, and many of them can be considered personal data, especially if they are connected to my username. My rights are not waived because I publish something publicly.
Puzzling comment. Perhaps you should read the Terms of Service.

  User Content Transmitted Through the Site: With respect to the content or other materials you upload through the Site or share with other users or recipients (collectively, “User Content”), you represent and warrant that you own all right, title and interest in and to such User Content, including, without limitation, all copyrights and rights of publicity contained therein. By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed. However, please review the Applications Privacy Policy located at https://www.ycombinator.com/apply/privacy, for more information on how we treat information included in applications submitted to us.
https://www.ycombinator.com/legal/
> and will grant Y Combinator and its affiliated companies

is Google ( or Alphabet ) an affiliated company?

There are so many problems with that with regard to GDPR. It’s not even linked from the signup page, so I’ve never even been informed about it. I’ve never been given the option to consent or reject consent to anything. You also can’t just say it’s “irrevocable” and pretend like GDPR doesn’t apply.
How is not posting it publicly consenting to it being available publically?
>GDPR

yeah, you may wanna check your maps - youre a little too far from home.

I hope the downvotes are for the sarcasm, not the point expressed. But considering other threads about this have been flagged[1] I guess this topic is taboo.

1: https://news.ycombinator.com/item?id=20052076

I think many people have the opinion that GDPR 'should not' apply outside the European region (or other countries that have enacted GDPR legislation).

A common sense approach would be to say that if someone in Europe chooses to submit their details to an entity based outside of Europe then that's just their choice to waive their rights to their data.

Lawyers and legislators might make arguments and talk about analogous laws and situations but that simple doesn't convince a lot of people. Hence you will get downvotes for saying something so common is illegal.

(As a side-note it's interesting that you personally choose to submit your copyrighted comments to HN given your knowledge of how you're losing effective control of your information.)

So all companies can just have a shell company outside Europe.
It is illegal, whether people wish it wasn’t or think it’s sensible or downvote comments that point it out. Personally I think the gist of GDPR is more sensible than what you’re proposing as common sense - each person should have the right to decide what happens to data about them.