Hacker News new | ask | show | jobs
by zX41ZdbW 1 hour ago
I host a publicly open database with Hacker News data at https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...

So you can create any sort of similar services in a single SQL query and an HTML page.

I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...

It is also updated in real-time.

3 comments

Thank you for providing this, you are a hero!!! I'm gonna try to do cool stuff with it!
It probably also got swamped in real-time...
Do you mean it's not updated? You gotta sort by update_time column. Looks sorted, but you gotta sort it with a query like:

SELECT * FROM hackernews_history

ORDER BY update_time DESC

LIMIT 100;

And yeah, I got that from deepseek because I don't have a brain.

oh hey, per HN terms and conditions I license my HN data only to HN. Can you please remove my data from the set? Thank you!
Not sure if joking, but if this product is not republishing the text of your contributions (to which you hold copyright), you’re probably not going to convince a court to do anything here.

Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.

Wait, so I have to ask for every single person's permissions to use this data?

uhhhhhhhhhhhhhhhhhhhhhh

You must be fun at parties
By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.

@zX41ZdbW, you can safely ignore this guy.

@GeoAtreides, next time read the actual terms of service before hallucinating.

> for any Y Combinator-related purpose

That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.

You might want to read it again, then:

https://opensource.org/license/mit

That is about the software, not the data.
While a literal reading of the MIT license refers to "software", many datasets have been released under it.

In particular, if someone releases something that is only a dataset along with an MIT license file, the most reasonable interpretation is that the rights holder intended to release the data under the terms of that license.

I looked for copyright cases involving this specific distinction, whether "data" versus "software" makes a legal difference, but didn’t find anything.

So the question remains open (for you, for me it's pretty clear the dataset is released under MIT).

You might want to sue and find out. It sounds like an interesting experiment.

>Y Combinator and its affiliated companies

is zX41ZdbW either?

Oh, now I see my comment might be a bit harsh.

I didn't consider you might now know about:

https://github.com/hackernews/api

yes, and per HN terms and conditions only YC and YC affiliated (as you quoted) can use the api legally. I don't license my content to anyone else and so it shouldn't be use by anyone else, even if it's available on a free-for-all API (nice move HN, btw).
https://github.com/HackerNews/API/blob/master/LICENSE

It's right there, you just have to click the link I shared ...