|
|
|
|
|
by ericlippert
2269 days ago
|
|
That's a great example; the most important part of your anecdote is the end which says what was the user impact? There is no prior reason to believe that a 50x speedup is actually a win; taking an algorithm from 100K nanoseconds to 2K nanoseconds when hitting the file system takes billions of nanoseconds is not a win for the user, and taking an algorithm that takes 5000 years down to 100 years is probably not either. But a 50x win that, as you note, goes from "let's have lunch" to "let's change this one thing ten times and re-do the analysis to see which gets us the best results" is a huge game changer; it's not just that it saves time, it's that it makes possible new ways to work with data. |
|
A little bit after I have made the change, an user filed a bug [2] to the library about slow behavior when you passed a large amount of data to the library. They were using the public release instead of git master, so my code wasn't at fault thankfully. Due to the knowledge I attained from doing the change, I could quickly confirm that precisely this slow part was cause for the performance slowdown, and was able to file a PR to reduce the library's complexity from quadratic to linear [3].
It wasn't very complicated from the algorithmic point of view, I only had to create a lookup structure, but the lookup structure had to be created in a certain way so that the library still behaved the same way as tested by the test suite. It also had to support things like duplicates as the original implementation also supported them, as a very important feature in fact (toml arrays).
[1]: https://github.com/alexcrichton/toml-rs/pull/333
[2]: https://github.com/alexcrichton/toml-rs/issues/342
[3]: https://github.com/alexcrichton/toml-rs/pull/349