Hacker News new | ask | show | jobs
Communication Logs: Handling 250M SMS messages a day (infobip.com)
27 points by Parseco 3844 days ago
4 comments

They mention using Neo4j instead of a relational database. Recently I needed to implement a friendship data model, and started off using Neo4j because it seemed like the best tool for the task, but ended up using essentially the following model in Postgres:

    CREATE TABLE friendship (
     person1 INT NOT NULL,
     person2 INT NOT NULL,
     PRIMARY KEY (person1, person2),
     CHECK (person1 < person2)
    );
    CREATE INDEX friendship_person2 ON friendship(person2);


    CREATE VIEW friendship_view AS
    SELECT person1 AS person, person2 AS friend 
    FROM friendship
    UNION
    SELECT person2 AS person, person1 AS friend FORM friendship;
taken from here: http://www.postgresql.org/message-id/20141111201127.d80b6bc4...

What do you folks think? It seems like this would be the densest way to store the relationship, using the checked constraint trick with the IDs so you don't need duplicate records to store Alice and Bob's friendship as {A -> B, B -> A} if the friendship is inherently bidirectional (vs a followed/follower model), and doesn't preclude one from storing the data in another isomorphic structure that might be optimized for specific queries. I'm sure it's possible to replicate this with Neo4j, but it felt cumbersome programming that in Java versus expressing that in SQL. But I'd really like to use Neo4j in an OLTP environment, but still haven't had enough of an impetus yet because WITH RECURSIVE in postgres works well if the recursion depth is capped.

Your `friendship` table is a simple adjacency list representation for an undirected graph. As you point out, the main trouble in using a relational database for a true graph model is to process graph traversals, which you can do with nested inner queries, joins, or Postgres' WITH RECURSIVE. A graph database, for that reason, is usually optimized as a very efficient sequential JOIN machine, where you can imagine successive JOINS as connecting the endpoints of edges along a graph path.

By the way, if you're interested in graph representations for efficient querying, check out TripleStores [1], and the Wiki list on subject-predicate-object databases [2].

[1] https://en.wikipedia.org/wiki/Triplestore [2] https://en.wikipedia.org/wiki/List_of_subject-predicate-obje...

Thanks! I was just reading the neo4j source and noticed the bitmap indexes which apparently have several advantages to B-tree indexes for indexing graphs:

https://github.com/neo4j/neo4j/tree/3.0/community/lucene-ind...

ps:

http://stackoverflow.com/questions/9541541/b-tree-vs-bitmap-...

https://en.wikipedia.org/wiki/Bitmap_index

Thanks for showing me triplestores! Good to know that it's better to use a database engine optimized for triples for graphs vs rolling your own in SQL.

Worst-case, you'd have to process input at a sustained rate of (140×250×1000000)/(24×3600) = 405093 B/s.

I certainly hope your system can manage 400kBps throughput.

You're assuming a constant arrival rate and zero overhead. Both of which are definitely not accurate.

As a wild guess, you're off by 1000x. You may say 400MBps is still not too crazy, but it's beyond the real capacity of many systems out there.

Of course those aren't sane assumptions, but if the peak rate were the significant information, why would they title the article with the per-day rate? It smells like someone decided "250 million is a large enough number to scare people". Fuck that.

Even if peak "instantaneous" flow is 400MBps, considering that SMS already has seconds to minutes of latency, a bit of buffering shouldn't be a big deal.

They start the article by saying they make six (not necessarily serial) network roundtrips for each message which indicates that something is horribly wrong with the design of their system. Their code needs to bill each SMS to the sender, send the SMS to the telco, and log successes and failures in a way that's searchable later. The rest of the article describes their terribly overwrought approach to these tasks. With so much enterprise software and so little actual discussion of the problem domain, there's no way they're butting up against anything interesting enough to merit a HN submission.

isn't this the minmax of the peak rate?
What's Infobip pricing for numbers and SMS in US?
Nice work, but to be fair 250M msgs per day is not at all impressive.