Hacker News new | ask | show | jobs
by chopsuey5540 854 days ago
This looks interesting but I’m a bit worried about the CSAM / illegal stuff part, could a user get in trouble because he has traces of that in his crawled index? Also, how large is the index after indexing for a few months?
3 comments

An indexer doesn't download content. The only information you'll have is the name of a torrent, potentially its files, and who is interested in those files.

But that's the technical view, what happens in court might be totally different.

In order to get the information such as the name of the torrent and its files from the hash you do need to connect to someone in the swarm to download that metadata. You won't know what it is until after you've already connected.
Connecting to an unknown machine and asking what they have, is like knocking on a stranger's door and asking what they're selling. Them mentioning something nefarious and you leaving in response, is very obviously not a crime.
There probably are nefarious content you can see just from the filenames but not everything is like that. Moreover, you "only" know they distribute it, you don't do it yourself.
Considering many countries block torrent sites I wouldn't chance it.
The real question is: metadata is data, so are there any limitations on how much data can be transferred through DHT using well-behaving clients/servers so that you can be reasonably sure what you download on your machine isn't poisoned enough to possibly get you into trouble with the law enforcement?
At least in the case of https://coveapp.info, the metadata you fetch from users while scraping is disassembled into a form for efficient searching only. The only part remaining in an identifiable form is the infohash.