| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oli5679 3191 days ago
	Every month or so there is a new version of an article like this posted on hn

4 comments

endorphone 3191 days ago

At this point I believe these articles are the result of people who got over their own confusion/misunderstanding by writing a new article on bloom filters. Eventually there will be a 1:1 ratio of bloom filter articles / developers.

link

lalaithion 3191 days ago

So, bloom filters are the equivalent of Monads? https://byorgey.wordpress.com/2009/01/12/abstraction-intuiti...

link

davidivadavid 3191 days ago

They both seem like fairly easy concepts. I'm not sure why they get so much coverage.

link

andybak 3191 days ago

"A monad is just a monoid in the category of endofunctors, what's the problem?"

link

xelxebar 3191 days ago

Wait. Is this true? If so, that's cool. I know your comment was sarcastic, but as a grad student in math I've futzed with category theory enough that those words mean more to me than the raw Kleisli triple definition.

link

striking 3190 days ago

Yeah, it's absolutely right. And I think exploring that definition is actually the best way to learn what exactly a monad is.

link

vertex-four 3191 days ago

The problem is that Haskell developers like to push monads as the solution to a problem most developers in other languages don't actually have, so discovering the problem that's being solved is the difficult bit.

Of course, this doesn't happen with other niche solutions because other niche solutions don't have an entire popular language which nearly-precisely encodes the problems monads are good at solving.

link

hikarudo 3190 days ago

It seems you're talking about the IO monad specifically, not about monads in general. Other monads, like Maybe and List, absolutely occur in programs written by "most developers in other languages", whether they perceive them as monads or not.

For instance, a pointer type in C is like a "Maybe a" in Haskell, because a pointer can always be null (Nothing).

link

vertex-four 3190 days ago

So how does knowing that Maybe is a monad solve a problem for anybody not programming Haskell?

link

bradleyjg 3191 days ago

I read a few of these articles a couple of years ago. Along with a few of the inevitable cuckoo hash filter rejoinders.

I think they are neat algorithms and I'm glad to have come across them. But that said, I have yet to find a problem in my day to day work which required set membership, with space at a premium, and where false positives were acceptable. So I've never used either in anger.

link

manish_gill 3191 days ago

I use them at work as a cache during data ingestion phrase (analytics). I have to store a unique URL for each page the user is at, and each page generates a lot of requests. So I store the URLs inside a Bloom Filter, hitting the DB only when the contain() returns False. It's a neat little thing that saves me thousands of unnecessary database hits per second.

link

3131s 3191 days ago

I have used them for text segmentation. It's an extremely quick way to test for membership on a set (30+ million tokens in my case) that would otherwise be too large to hold in main memory.

link

arielweisberg 3191 days ago

If you are bored with bloom and cuckoo filters then check out quotient filters. Quotienting was one of those mind blown things for me.

link

jason_slack 3191 days ago

Thanks for the reading list! :-)

link

Jake232 3191 days ago

They're very useful in large scale web crawling/scraping. I use them for a number of things in this field.

link

busterarm 3191 days ago

Also in distributed brute-forcing of encryption standards.

link

mlevental 3191 days ago

you scoff but this is a very thorough presentation of a bloomfilter - very few of these sorts of articles actually cover the computation of the probability bounds.

link

niftich 3191 days ago

Yeah, there have been a good number [1] and a steady stream of submissions about Bloom filters (and truly, the inevitable re-riff about Cuckoo filters), but this article is toward the higher end of the quality scale.

It's a bit odd that a data structure attracts this kind of attention, but not all of it is about self-discovery, and the fact that people feel writing about them belies the fact that they either consider it a novelty, or expect members of their intended audience to consider them as such. Hopefully with time, we will reach a saturation point where most people (including beginners) are familiar with Bloom filters because they've been formally taught or read one of these articles.

[1] https://hn.algolia.com/?query=bloom+filter&sort=byDate&type=...

link

weaksauce 3191 days ago

I want to say it started a while ago because some hiring manager asked a question about how someone would do something but had the intended answer as a bloom filter. The interviewee turned to hacker news to vent about it and then it brought enough eyeballs to a somewhat fringe algorithm to self-sustain these types of articles.

link

foo101 3191 days ago

The Wikipedia article on bloom filter is already pretty thorough and discusses the computation of the probability bounds as well as the optimal parameters: https://en.wikipedia.org/wiki/Bloom_filter

link

dang 3190 days ago

Wasn't that more like 2011? I was thinking "oldie".

link