|
|
|
|
|
by gwf
5205 days ago
|
|
The baseline was a substantial test corpus that we scaled several orders of magnitude over a series of runs, all meant to simulate typical clip size with typical word frequencies. The 10-100x gains had two contributions that each was in the 3-10x range. We tested it against the standard search which incorrectly performed pagination because of the misorder on sort versus slice. We also tested in on two types of map/reduce jobs that correctly implemented the sort and slice (and had been in production). Ideally, we would have kept the data around to give a fuller a report. But the truth is we did this over 9 months ago and didn't save the data. After informally sharing the impact with a lot of people, we heard a lot of encouragement to share the techniques. And you're right, this is not hard core science nor engineering. But it is a good tip, which you can take or leave. |
|
And I won't even comment on the "we did this 9 months ago and did not save the data" part. That kind of stuff would not fly in medicine or science or most traditional engineering fields. Why should you be exempt?