Hacker News new | ask | show | jobs
by maloga 2988 days ago
Blogpost author here. Thank you so much for all the attention, comments, upvotes, likes, retweets, etc! I've done a pass over the comments and can't really answer them all but I'd like to clarify a few things:

There seems to be a general opinion trend that the queries generated by the group builder algorithm are very inefficient, that it'd be easy to come up with a solution with much better response times, and that that would be achievable in any reasonable programming language in roughly the same time with similar results.

The language argument will always be controversial and I won't address it here; we have a point of view that is expressed in the Conclusion and on this blogpost: https://movio.co/en/blog/migrate-Scala-to-Go/

I can imagine that seeing a query with JOINs, subqueries, GROUP BYs and UNIONs can raise some eyebrows, but there is some lacking context in that story, and that's on me. Here's some of that context:

* The schema that the group builder algorithm operates on is not uniform in nature or composed of simple yes/no fields; it's an incredibly complex legacy schema that to a large degree wasn't even up to Movio: it's been up to the film industry as a whole, and it has evolved over the years, as is the case everywhere. Note that every different kind of filter translates to a very different kind of query, and we have more than 120 different filters, sometimes with dynamic parameters, and sometimes even bespoke for a particular customer!

* The group builder algorithm predates the team that built this service (myself included), as well as predating the first commercial release of Elasticsearch, MariaDB, mainstream Go success, etc. Nevertheless, it's still very fast and is being used today by ~88% of our customers (i.e. all the non-behemoths). It's been successful for many years, and continues to be, for the most part.

* But I don't like it because it's fast: I like it because it's simple and flexible. It allows our customers to build a really complex (and arbitrary) tree of filters to segment their loyalty member base, and it compiles all of that into one big SQL query, that in most cases is quite performant. That's pretty awesome. But yes; it doesn't scale to several million members.

* Migrating the very engine of the main product of a company is not a decision that is taken lightly. As is the case with every big company I can remember (e.g. Twitter, SoundCloud), behind a big success story there's always a legacy monolith, and our case is no exception. From that standpoint, achieving such breakthrough (i.e. cost reduction + significant response time improvement) within one hackathon day is really not all that common in my experience. Definitely something worth sharing, IMO.

Hopefully that clarifies some of the questions :) Cheers.

1 comments

We had the same issue where I work and we are doing a very similar thing but on a way larger scale (adtech) for audience building and we actually resorted to compressed bitmaps since postgres was not cutting it. It's fairly easy to just come on a forum and say hey: just use postgres/mysql/sql server without reading the full article and understanding what you guys are dealing with.