Hacker News new | ask | show | jobs
by crntaylor 5012 days ago
Obvious point to raise: the reason people regularly delete their browser history is because they watch porn without turning on private browsing. How do you propose to deal with this?

You'd need to provide at least the ability to selectively delete portions of the history. But you can selectively delete portions of your browser history too, and people don't - because it would be too easy to miss something. Instead, they just nuke the whole thing. How is your tool different?

5 comments

I take advantage of my browsers history with porn. When I was in college the first bash script I wrote was to open a movie in my porn collection that I hadn't watched in the longest time. This was great. But now with streaming porn sites I don't have a huge collection and I often watch scenes that I can't seem to find later. There is a lot of porn out there.

Sure people clear their browser history because their embarrassed by their porn obsession, but I think this tool could be very useful for pornaholics too.

You could do something similar today—just hook into your browser's history API
Vinny Glennon, One of the founders here. Thanks very much for the up votes. The Chrome extension does not work in private browsing. I have a set of porn sites(1.7 million stored in redis) that I check if incoming links are a member of. You can selectively block sites ( https://www.seenbefore.com/blacklist_items).
Where did you get your list from, and can you share it?
For research purposes.
For science.
Do you store the entire set of 1.7 million entries in redis? Or is redis an index to data stored elsewhere, in a relational DB perhaps?

I was under the impression that redis wouldn't be all that useful to store a lot of data. Would be great if something as quick as redis could work with large data sets.

Storing a list of 1.7 million strings for us takes 70mb stored in memory. Testing for membership is an O(1) op. Very happy with it. We use mongo as a dumb data store as well as a bunch of other infrastructure tools, like http://circleci.com we could have only dreamt of years ago.
> Testing for membership is an O(1) op

Curious how. O(1) an array index lookup, not a string lookup, I thought.

Again just curious, but I'd still like to know how. Someone there asks the mod how it could be O(1), the mod replies it's a "hash table lookup". But Wikipedia at http://en.wikipedia.org/wiki/Big_O_notation suggests that such lookup is no faster than O(log log n). I think the redis info is incorrect.

O(1) implies that the location of the member in the list is already known, with no search required. I don't see how that could be the case when it's a key lookup. The key could be anywhere in the list, even if the list is sorted. They key would have to be searched for, it seems.

Why not use a bloom filter?
The main benefit of Bloom filters is that they can be made small. Given that his database takes only 70MB or so and he's not trying to ship this to devices that might have much in terms of space limitations, there would appear to be little point.
Eh, true, I guess redis is sufficiently awesome.
Maybe because of this fact (according to Wikipedia)?: "The more elements that are added to the set, the larger the probability of false positives."
That depends on its size, though. You can make it larger and get fewer false positives.
Wouldn't one simply use one specific browser, say either safari or firefox or chrome, and only that browser for their... unsavory activities? I think that is a great way to keep accounts separate and keep "bad" sites from knowing about "good" sites and vice versa. Just saying. Not that I partake in any such unsavory activities.
For testing purposes I use Chrome's "Users" feature to keep an extra profile with no extensions installed handy.

The same could be done for a "Porn" profile too I guess, sand-boxing any history, extensions and bookmarks to that profile. You could even associate tie it to a Google account for portability.

This problem is nullified by private browsing. I think the idea is BRILLIANT, as Google's already tracking all my 'legitimate' searches, and I find that most of what I Google are things I've looked at on other machines, or seen already.

The noise introduced by phrasing my query differently is a real problem in search that Google hasn't fixed yet.

Porn sites are not recorded
How does your system define "porn sites"? What about if it was some porn site no one has ever heard of with an innocent-sounding name/domain?
He apprently uses a list of 1.7 million sites. But you can also blacklist sites and have any existing entries for it removed:

https://www.seenbefore.com/blacklist_items