Hacker News new | ask | show | jobs
by philsnow 3187 days ago
Is there a reason to choose millis as the granularity instead of micros or nanos? Is it because there's a stronger expectation of machines in a cluster agreeing on what milli it is "now" vs the other granularities?

I'm kind of thrown by the idea of putting the timestamp / stream-id in the XADD command, I would have thought the server would assign that, since one of the strengths of redis's single threaded nature is consistency: what's in redis is the truth. If you allow clients to specify timestamp, what happens when ntpd isn't running on some? I probably misread or misunderstood it.

Could you allow specifying `$` as the timestamp to tell the server you want it to use whatever it thinks the current time is as the timestamp / stream-id?

1 comments

Hello, the stream implementation does not need for the different servers (for instance master and its slaves) to agree about the time. Simply the server that receives the XADD command will generate the ID (and the time part of the ID) to attach to the item. All the other participants in the replication will accept the same ID, because clients will use "" to specify the ID, while the command is rewritten to slaves with a specific ID. Example, I run into the master:

    127.0.0.1:6379> xadd stream * a 1 b 2
    1506977609865.0
But this is replicated as (output of redis-cli --slave):

    "xadd","stream","1506977609865.0","a","1","b","2"
So XADD allows to specify an ID just for replication / AOF pruposes, not because clients should actually specify an ID normally. However of clients really want to do that, they could but at the risk of getting errors, for instance:

    127.0.0.1:6379> xadd stream 10.0 a 1 b 2
    (error) ERR The ID specified in XADD is smaller than the target stream top item
Redis will anyway not accept any ID which is smaller than the current top-item ID.

The reason why it was chosen to use milliseconds instead of nanoseconds is because, for most applications to query for sub-millisecond ranges is likely not useful, so to see even larger numbers in the ID maybe is just unpleasant if not useful, however we are still in time to change this if there are good motivations. But being the time the one produced by the local host, after a failover the IDs are generated by another host. Milliseconds can still more or less match with good time synchronization, but nanoseconds? So it's like if this additional precision will be just used to store non-valid info.

> Redis will anyway not accept any ID which is smaller than the current top-item ID.

I was reading on mobile earlier and maybe missed this point, excellent. I also didn't realize that clients would use (star) to specify the ID and that the receiving server turns that into an actual ID before replicating it / AOFing it.

> But being the time the one produced by the local host, after a failover the IDs are generated by another host. Milliseconds can still more or less match with good time synchronization, but nanoseconds? So it's like if this additional precision will be just used to store non-valid info.

I think most failovers necessarily take longer than a millisecond so any resolution smaller than millis would _probably_ be okay, but yeah this is not a compelling reason to switch to micros/nanos. My suggestion to switch to micros/nanos was more to try to reduce the number of collisions requiring the server to de-dup / assign sequential sub-epoch numbers to events arriving during the same server tick. I guess that's not a big issue though.

Thanks for the reply, Salvatore. Redis is one of my favorite codebases and projects.