Hacker News new | ask | show | jobs
by TeMPOraL 360 days ago
If you walk through the N-gram database with a copy of Harry Potter in hand and observe that for N=7, you can find any piece of it in the database with above-average frequency, does that mean N-gram database is violating copyright?
2 comments

Not unless you can reproduce large portions of Harry Potter verbatim from the database. If the 7-grams are taken only from Harry Potter, that is very likely.
If the database is sharing those pieces, it might be yes.

Copyright takes into account the use for such the copying is done. Commercial use will almost always be treated as not fair use, with limited exceptions.

I'd say no, because you can't reasonably access and order those pieces without already having the work at your side to use as a reference.