Hacker News new | ask | show | jobs
by oli5679 3191 days ago
Every month or so there is a new version of an article like this posted on hn
4 comments

At this point I believe these articles are the result of people who got over their own confusion/misunderstanding by writing a new article on bloom filters. Eventually there will be a 1:1 ratio of bloom filter articles / developers.
So, bloom filters are the equivalent of Monads? https://byorgey.wordpress.com/2009/01/12/abstraction-intuiti...
They both seem like fairly easy concepts. I'm not sure why they get so much coverage.
"A monad is just a monoid in the category of endofunctors, what's the problem?"
Wait. Is this true? If so, that's cool. I know your comment was sarcastic, but as a grad student in math I've futzed with category theory enough that those words mean more to me than the raw Kleisli triple definition.
Yeah, it's absolutely right. And I think exploring that definition is actually the best way to learn what exactly a monad is.
The problem is that Haskell developers like to push monads as the solution to a problem most developers in other languages don't actually have, so discovering the problem that's being solved is the difficult bit.

Of course, this doesn't happen with other niche solutions because other niche solutions don't have an entire popular language which nearly-precisely encodes the problems monads are good at solving.

It seems you're talking about the IO monad specifically, not about monads in general. Other monads, like Maybe and List, absolutely occur in programs written by "most developers in other languages", whether they perceive them as monads or not.

For instance, a pointer type in C is like a "Maybe a" in Haskell, because a pointer can always be null (Nothing).

So how does knowing that Maybe is a monad solve a problem for anybody not programming Haskell?
I read a few of these articles a couple of years ago. Along with a few of the inevitable cuckoo hash filter rejoinders.

I think they are neat algorithms and I'm glad to have come across them. But that said, I have yet to find a problem in my day to day work which required set membership, with space at a premium, and where false positives were acceptable. So I've never used either in anger.

I use them at work as a cache during data ingestion phrase (analytics). I have to store a unique URL for each page the user is at, and each page generates a lot of requests. So I store the URLs inside a Bloom Filter, hitting the DB only when the contain() returns False. It's a neat little thing that saves me thousands of unnecessary database hits per second.
I have used them for text segmentation. It's an extremely quick way to test for membership on a set (30+ million tokens in my case) that would otherwise be too large to hold in main memory.
If you are bored with bloom and cuckoo filters then check out quotient filters. Quotienting was one of those mind blown things for me.
Thanks for the reading list! :-)
They're very useful in large scale web crawling/scraping. I use them for a number of things in this field.
Also in distributed brute-forcing of encryption standards.
you scoff but this is a very thorough presentation of a bloomfilter - very few of these sorts of articles actually cover the computation of the probability bounds.
Yeah, there have been a good number [1] and a steady stream of submissions about Bloom filters (and truly, the inevitable re-riff about Cuckoo filters), but this article is toward the higher end of the quality scale.

It's a bit odd that a data structure attracts this kind of attention, but not all of it is about self-discovery, and the fact that people feel writing about them belies the fact that they either consider it a novelty, or expect members of their intended audience to consider them as such. Hopefully with time, we will reach a saturation point where most people (including beginners) are familiar with Bloom filters because they've been formally taught or read one of these articles.

[1] https://hn.algolia.com/?query=bloom+filter&sort=byDate&type=...

I want to say it started a while ago because some hiring manager asked a question about how someone would do something but had the intended answer as a bloom filter. The interviewee turned to hacker news to vent about it and then it brought enough eyeballs to a somewhat fringe algorithm to self-sustain these types of articles.
The Wikipedia article on bloom filter is already pretty thorough and discusses the computation of the probability bounds as well as the optimal parameters: https://en.wikipedia.org/wiki/Bloom_filter
Wasn't that more like 2011? I was thinking "oldie".