Hacker News new | ask | show | jobs
by orlp 1675 days ago
Looking through some of these URLs through Google cache, a lot of these contain verbatim copies of the problem text. Fair enough, legally speaking, if you are the copyright holder.

However, in the DMCA itself HackerRank asks for the removal of entire repositories, claiming "the whole repository is infringing copyright as it contains the solution". This is a blatant lie, HackerRank is not an automatic copyright holder of any solutions to a problem they published.

3 comments

Exactly, very weird DMCA.

Repos like this were DMCA'ed https://github.com/saikrishnareddykatta/react-movie-director...

I think the DMCA is incorrect but the copyright argument might be correct, as in, I imagine that the starting code was provided by HackerRank so they have the copyright. The solution is a whole different thing.

well... strictly speaking ... how can a git repo exist when a single artifact is blocked. the hashes will not sum up anymore.
The hashes will sum to something. To do it, at least as far as I understand, you'd have to use https://git-scm.com/docs/git-filter-branch . This will create a divergent history and the new master branch or any other branches that exists will have to be forced pushed. As far as "but local copies of the repo will have the 'problem files' still" - Yes they would. All parties would have to be notified of the legal request.

I'm not a copyright expert but it seems like enforcing this is another step in the erosion of fair use. Something about transformative works. The problem was transformed into a solution.

On the other hand hackerrank's terms of service should have banned this activity. I would imagine it does. I'm not sure how much leverage that gets them legally though. I suppose once you intend to publish it you're no longer an authorized user, and then you're violating that https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act we see get applied harshly from time to time.

Well, GitHub seems to be able to remove single `globalMaximum.hs` file from https://github.com/cmk/HR-Haskell
This is a great point. The author would have to rebase and force push. Or at least Github could try to selectively block access in the web UI.
Since we're getting technical, couldn't you could find a hash collision in the repo without the artifact to make them sum up again?
No, that's not how it works. (Finding a hash collission for an existing hash would be a preimage attack, and that's not possible for SHA1 with computing power available to humans.)
SHA-1 as cryptography was broken in 2005. The first collision created by humans was in 2017.

See https://shattered.it for the practicals.

SHAttered is a collision attack. A collision attack is easier than a preimage attack. There are no known preimage attacks against SHA-1.
... Or even against MD5, IIRC, which is why you are still kind of able to use HMAC-MD5. You probably still shouldn’t, but I don’t know of any other symmetric authenticator that is as short and requires neither vast tables of constants nor 64-bit operations for an implementation. (For all the recent lightweight crypto work, the only cipher I can reasonably see myself implementing on an oldish ATMega without disgust is the NSA’s Speck, with all the accompanying caveats, and there isn’t a single hash of a comparable complexity at all.)
I never wrote shattered is a preimage attack. What I wrote is exactly correct. There are multiple preimage attacks, neither of which I referenced.

A first preimage is where one searches for h(m1)=h(m2). A second preimage attack is where, given m1, find m2 such that h(m1)=h(m2).

It's best not to give the incorrect impression when discussing something exact. As with any crypto, the construction is either valid or not, but it is actually the use of the construction that determines real world correctness.

For example, if SHA-1 is used over input where there is known data in specific positions, that is quite different to SHA-1 over unknown data. In pratice, the first is often the case.

SHA-1 collisions have been proven as an attack vector for a few years now.

https://security.googleblog.com/2017/02/announcing-first-sha...

And, as the parent correctly pointed out, that would be a preimage attack, which is far harder.
In theory. In practice, since you roughly know the contents of the file, you could probably brute-force it pretty efficiently.
That just makes it a second preimage attack, which even SHA-1 is still resistant to.
> HackerRank is not an automatic copyright holder of any solutions to a problem they published

That's an interesting point of view. You're saying the question text is copywrite-able but the logical conclusion of such a question is not?

Copying the question text verbatim certainly is copyright infringement (and I would guess unlikely to be fair use, but I'm not a lawyer). If you give the problem in your own words, it won't be, just like your solution isn't.
Giving the problem "in your own words" raises the question of whether or not your restatement constitutes a derivative work.

From the 1976 Copyright Act section 101:

https://www.law.cornell.edu/uscode/text/17/101

> A "derivative work" is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a "derivative work".

Chegg seems to have sorted this out with textbook questions?
They license it I think.
That's not how copyright works. Simply rewording each question will still violate copyright when you're copying all of the questions. I couldn't just reword Harry Potter and republish all of the novels. Likewise, copying some small portion may fall under fair use.
On the contrary, I’m fairly sure you could do that with the Harry Potter books. What jurisdiction are you in?
I believe (and I’m not a legal expert here), that what the commenter is alluding to is the “moral rights” over a work. This is more common in the UK and Canada. When I worked for a Canadian company (I’m a U.S. American) part of our IP release for works-for-hire included a waiver/transfer of rights specifically for the “moral” or “authors” rights, which were explained as being the “spirit” or “whole” of works we created.

It’s a legal construct, but I was never satisfied with that explanation. The gist was that while we could transform things that we learned or did, we could not re-use the ideas that formed the functioning product in other works elsewhere, even by transformation.

Doesn’t the Berne convention specifically make moral rights inalienable? The whole point being that you may transfer redistribution rights, etc. to somebody else, but they still don’t get to (affirmatively) claim they wrote it (thus can’t even try to pressure you to allow it)? Am I misunderstanding or is the local (and likely original) definition of “moral rights” different from the one in the convention?
That is how copyright works. I believe you are thinking of plagiarism, which isn't the same thing.
The same way you may copyright a writing prompt, but you won't automatically get copyright to stories inspired by said prompt
Seems to me that’s the simplest and most intuitive point of view. Can you imagine publishing a question and then owning the copyright to every answer to it someone writes!?
There is a difference between "Questions" and "test questions with a known specific solution". One could argue (I would not) that the solution to a test question is an integral part of the question and therefor if the one can be protected by copyright, so can the other".
While that would apply to simple maths questions — 16x16 is always 256 — I don’t see that applying to HackerRank programming challenges, where the challenges (last time I used it) are essentially “produce correct output from this mostly-secret input”, and they don’t even mind which language you use to do this, never mind what variable names you use.
Is there any legal basis for that argument?
Nope
At least in this case, there can be multiple correct answers using different approaches, in that case your theory leads to further confusion.
It such a question exists, surely it is the exception that proves the rule.
Not in the least. Copyright applies to creative expressions, not functional expressions. My solution to your problem is my creative expression, not yours.
If only because you could get the answer wrong.
Both a question and an answer may or may not be subject to copyright. What GP is saying is that such rights, if any, are assigned to their respective authors.
The solution is certainly copywritable, but not by hackerrank as they did not author it
I think (or at least hope) that copyright is limited to a specific expression of a specific solution. Anything more broad would be tantamount to a copyright on basic algorithms, I think.
And what happens when there are multiple different possibilities for the solution? As is the case here, the solution can be achieved in multiple ways. It would be bizarre for one to have copyright claim over all of them.
My initial reaction to your question was that I thought perhaps Hackerrank could claim copyright on any particular solution that had either been written by Hackerrank or where the copyright had been assigned to it.

With regard to the latter, I would guess, knowing how corporations work, that Hackerrank requires anyone taking a test to assign her rights, with respect to any solution given, to the company. (To be clear: I do not like that at all.)

In the US, the Copyright Act of 1976 extended copyright to unpublished works (Hackerrank presumably does not publish its own solutions, though it might well register them.) IANAL and AFAIK, I think there are fairly stringent requirements on fidelity for a work to be considered infringing, and there is also the matter of fair use, but in practice in this case, it is Github, not the courts, that Hackerrank has to persuade.

I would guess that publishing something that advertises itself as a solution to a Hackerrrank problem might fall under trademark infringement or some such law, and something stating the particular problem being being solved might be an infringement of a copyright on the problem as stated by Hackerrank.

> I would guess that publishing something that advertises itself as a solution to a Hackerrrank problem might fall under trademark infringement or some such law,

Trade secret, rather. In fact, given the questions are supposed to be unpublished, those would more properly fall under trade secret law as well (except that the DMCA doesn't apply to trade secrets or any other form of IP other than copyright).