Hacker News new | ask | show | jobs
by frankmcsherry 4198 days ago
With all due respect, your code seems to be a copy/paste of my code, where you've removed the Pair and Option implementation (allowing the alternate default I discussed in my second blog post) and replaced my string `ColumnarVec` with your string `SimpleSerialize`. So, thank you for measuring my code for me? :D

The reason the new numbers look better for Option with the Copy implementation is that you are now writing 16 bytes for each Option. They are a 50-50 mix of Some(0u) and None, which I wrote out in 5 bytes on average (always a 1 byte "present/not", and then 4 bytes of data on average). The Copy implementation is just writing 3x as much data and padding the throughput numbers, rather than reporting something more like goodput.

I sense that this isn't the right place to shake this out, so I'll stop posting. If there is a better way to follow up, let me know.

[edit: I've got you through github, thanks! I'll buy beers when I swing by TU next]

1 comments

The changes make all the difference. Your argument was that columnization and thus storing each native type in a data record in seperate columns brings a performance benefit. By removing the specialisation of Pair and Option which save the data in seperate columns I switched back to a classical storage which basically stores the data of one record at one place like a row store. So your code simulates a column store and my a row store.

Using your original benchmark I than show that the row layout brings a large performance win in the benchmark. My numbers show this performance win not just in throughput (bytes per sec.) but als in "goodput" (values per sec.). Check my previous comment. I just noticed though that I sometimes forgot the k in the reported numbers, where it is missing you have ti multiply the number times 1000, I can't edit it anymore.

I guess we can agree to disagree and should continue the discussion in another form ;)

PS: I just updated my github information, I am now at the TU Munich