Hacker News new | ask | show | jobs
by tombert 476 days ago
I completely agree with point 6, obviously these microbenchmarks are not representative of much of anything, I was just trying to check my assumptions on some of this stuff.

I actually knew about the Guava asMap(), but I wasn't sure if that introduced overhead and I thought that might be an unfair test.

I have heard of Caffeine, but I haven't used it yet. Maybe I'll write an updated post talking about it.

I'll look into JMH and make a part 2 of this post tomorrow.

1 comments

In both the asMap() view unwraps the opinionated Cache facade to the underlying bounded map.

There are a lot of little gotchas so it is best to not believe your own results until proven otherwise. A Rust influencer wrote EVMap, eagerly giving a talk about its performance equaling concurrent maps while being much simpler. However since he generated the random numbers as part of the test loop, he did not realize that it uses a lock to compute the next seed. This throttled the benchmark to make the faster maps slower as they created more contention, fully invalidating and inverting the results. Sadly since this was presented as part of a PhD such truths are of little importance and the false claims continue to be shown. That's just to share how innocent mistakes are the norm, but incentives to not correct them are doubly so, and that you should never trust your own results until you've exhausted all attempts to disprove them as invalid.

Sounds like you have the right perspective and a great attitude. Feel welcome to ping me if you run into any troubles or questions as I collab'd on Guava's cache and wrote Caffeine.

I actually just got JMH set up in Gradle, and I'm rewriting my tests to try and give more accurate results. Once I have numbers from JMH I'll write another post, and I'll put a caveat to take them with a grain of salt.

In this specific example, I was actually afraid that randomness taking indeterminate time could be an issue, which is why I pre-allocated all the UUIDs at the beginning, and used the same list as input data for each one; I figured if all the setup was allocated before I started the timer, and every test has the same input, then we can factor out any inefficiencies since that should affect all of them roughly equally.

Anyway, I have wanted to learn JMH for awhile, so this is the push I needed to actually do it.