| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antirez 3180 days ago
	Yes actually maybe it's a good idea to change the point with something else. Thanks for the hint.

5 comments

ryanschneider 3180 days ago

Supporting `MAXSIZE` as well as `MAXLEN` on `XADD` would also handle a nice Kafka feature (the ability to define your log size in either number of messages or size on disk).

Something like: `XADD MAXSIZE ~ 2147484000 * foo bar` to cap the stream at 2GB + 1 node.

link

dvirsky 3180 days ago

And if ids are timestamps, maybe we can define it as MAXTIME as well.

link

agumonkey 3180 days ago

What were or would be your prefered alternative syntax ? `:` ?

link

antirez 3180 days ago

Not sure... : looks ok actually, even _ or - could make some sense. The # is a bit too heavy on the eyes :-)

link

femto113 3180 days ago

I'll probably eventually hate myself for even bringing this up but I can't help but notice the similarity between this ID structure and Version 1 (aka timestamp) UUIDs. While I wouldn't go as far as recommending that you fully adopt that form, it might be worth considering if you could make these IDs compatible with UUIDs by defining a canonical transform. The critical differences are:

- UUIDs use a different epoch (15 Oct 1582 vs 1 Jan 1970) - UUIDs count 100 ns blocks instead of ms - UUIDs include a 6 byte "node id" - UUIDs allow only up to 15 bits of "sequence"

I think that last one is the biggest deal, since as currently specced redis allows 64 bits of sequence, which is obviously much bigger than 15. The options I see are either up the time resolution used by redis, encode some of redis's sequence bits into the UUID's time bits, or just live with it as a limitation--in practice 2^15 is a lot of messages to get in a single millisecond (though in cases of clocks jumping back might not be too much).

You'd also need to come up with some thing for the node id, perhaps the first 6 bytes of a cluster node ID or similar.

link

femto113 3179 days ago

Thinking about this some more I realized you could also encode sequence into the low order bits of the timestamp, and rereading the RFC showed it actually makes this recommendation[1]. There are 10000 100-nanosecond periods per millisecond which gives about 13 more bits. Between that and the 13-15 bits available in the clock sequence you've got 26+ bits of sequence or ~67MM values per millisecond.

Since 64 bits is overkill for milliseconds (45 bits covers the next 1000 years or so) I was thinking you could put 2 bytes of the node id in the high order bytes there (perhaps could call this the "clock id"?) and the remaining 4 bytes of the node id could go in the high order bytes of the sequence, which would still leave 32 bits for actual sequence values (but we should only use 26 or so). This means we'd get a translation roughly as follows (numbering bytes and bits from high to low significance):

   Redis                                   Version 1 UUID
      Timestamp
        Byte 0-1  "clock id"               Bytes 4&5 of node id
        Byte 2-7  millis since 1 Jan 1970  * 10000 => ~45 high order timestamp bits
      Sequence
        Byte 0-3  "node id"                Bytes 0-3 of node id
        Byte 4-7
           6 bits wasted space             ignored
          26 bits actual sequence value
            13 high order bits             => clock sequence
            13 low order bits              => low order timestamp bits

Another implication of this scheme is that if redis has access to a clock that offers higher than millisecond resolution it could store everything more precise than millisecond into the sequence portion of the id.

On a side note it seems that the clock sequence in the UUID is intended to be reset to a random value at start up and every time a clock jump is detected rather than just incremented. Redis could do something similar by incrementing some of the 13 high-order bits of the sequence every time a clock jump is detected (and/or if the 13 low-order bits overflow)

[1] https://tools.ietf.org/html/rfc4122#section-4.2.1.2

link

lsiebert 3180 days ago

I think a mostly vertical symbol is better, ":" or "|" is preferably to "_" or "-".

link

simonhamp 3180 days ago

I agree. Perhaps a forward-slash (/) to denote subordination, e.g. 1507035873/11

link

anyfoo 3180 days ago

Why is that?

link

agumonkey 3180 days ago

verticality convey difference in kind of information better IMO

link

rakoo 3180 days ago

Why keep the separation at all ? Are clients expected to be able to query for a given timestamp precisely ? Because then you get all the problems with clock synchronization, especially given that the Streams' clock is monotonic and I'd expect clients' clock to not be

link

unkown-unknowns 3180 days ago

It might be useful to be able to query by server time regardless of whether your client clock is in sync. You retrieve some set of data and the next time you can ask the server to give you everything newer than x, where x was the highest time stamp you got from the server previously.

link

rakoo 3180 days ago

Yes exactly, you want to ask what is newer than x, where x is the last event you're aware of, but you don't really care about the date and time in that case. If you just store the last id given by redis Streams naively then you don't even care that they're timestamps; at that point my question is, why even bother with the distinction. Just ask for everything after x and be done with it.

link

lsiebert 3180 days ago

Redis also has TIME to get the current server time with milliseconds and the unix time stamp. I'm reasonably sure that's what's being used to get the first part of the ID anyway.

link

rakoo 3180 days ago

Ah, I didn't know that. Although the post says that the timestamp of an id might also be the timestamp of the last message, since the clock can go backwards, so in the worst case a client might get some duplicate messages.

link

unkown-unknowns 3180 days ago

I vote that you use a dash instead.

link

zedpm 3180 days ago

It turns out that's what it will be [0], as dash retains the ability to easily copy the whole identifier in most terminals.

[0]: https://github.com/antirez/redis/commit/1189d90d749c84e98424...

link

pookeh 3180 days ago

You can also have a look at the technique used here to create collision-free sequential unique IDs across a cluster, even if it is just for inspiration: https://www.npmjs.com/package/cuid

Example:

c - h72gsb32 - 0000 - udoc - l363eofy

The groups, in order, are:

1. 'c' - identifies this as a cuid, and allows you to use it in html entity ids. The fixed value helps keep the ids sequential.

2. Timestamp

3. Counter - a single process might generate the same random string. The weaker the pseudo-random source, the higher the probability. That problem gets worse as processors get faster. The counter will roll over if the value gets too big.

4. Client fingerprint. For example, the first two chars are extracted from the process.pid. The next two chars are extracted from the hostname.

5. Pseudo random (Math.random())

link