| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mathias_10gen 5518 days ago

FYI - Your _id trick is similar to the ObjectID type mongodb uses by default.

"A BSON ObjectID is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Note that the timestamp and counter fields must be stored big endian unlike the rest of BSON. This is because they are compared byte-by-byte and we want to ensure a mostly increasing order."

http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSO...

2 comments

FooBarWidget 5518 days ago

My _id is different from ObjectID. It does begin with a timestamp but one that has 2010 as epoch. It's also followed by a kind of user ID and a piece of random identifier that appears elsewhere in the document.

link

tszming 5518 days ago

Interesting, so the _id trick mentioned by FooBarWidget is not the real reason for the speedup?

link

FooBarWidget 5518 days ago

It is. I was using a totally random string key, not the default ObjectID.

link

tszming 5518 days ago

So if you stick with the default _id, the claim of "Your indexes must ﬁt in memory" is no longer valid?

link

FooBarWidget 5518 days ago

That totally depends on your workload. In my case my working set happens to be mostly equal to the most recently inserted data. If you have to regularly access lots of random documents with no locality whatsoever then your working set is very large and should fit in memory.

By sticking with the default _id, with my workload my _id index doesn't have to fit into memory. I can't actually use the default _id for various reasons but that's a whole different discussion.

link

tszming 5517 days ago

Useful information, thanks!

link