Hacker News new | ask | show | jobs
by thoughtpolice 3972 days ago
Sigma is quite advanced; it's essentially an online DSL where authors can push anti-spam rules (written in Haskell) into live production by reloading code at runtime. Queries are done against Sigma in real time by other services, and in turn, Sigma has to query a lot of other data sources continuously in order to determine whether a rule may fire.

For example, a particular rule about the nature of some of your friends on Facebook may need to query 10 different data sources (different DBs, caches, monitor infrastructure). One of the really nice things about Sigma is that it's built on Haxl, a library for efficient concurrent data access. It can also optimize the typical 'N+1 Query problem' away.

What this means is you can write a program like:

  ids := getAllUserIds  -- fetch from source 1 time
  foreach id as ids {
    x <- getUserFriends id -- N queries, 1 for each id
    ...
  }
Which is simple and naive, yet Haxl can optimize this automatically into a program that will A) batch all of the data accesses together (so instead of running N queries for each ID, each query gets batched into one request for a range of users), B) automatically access each data source concurrently with no programmer intervention, so when queries can execute in parallel they do so, and C) cache the results, so that you aren't re-querying already fetched data.

There's a very good paper by Simon, the author of this blog post, discussing the design of Haxl and its use: http://community.haskell.org/~simonmar/papers/haxl-icfp14.pd... Quite the neat system!

Note: I do not work at Facebook, but I do chat with Simon a bit - this is basically the very high level 20,000 foot view from what I've read from Simon writing on the subject.