I understand where you're coming from. When I first started doing this, 1.5x was the maximum I could go and still retain information. After a while, I was able to retain information at 2x, then 2.5x and so on. It took me a year or two of practice to get to this speed. It also depends on what you're trying to learn, how clearly the speaker is enunciating, and whether the speaker has a heavy accent or not, I guess, but it works for me in most cases.
My internal voice while reading is around 2-3x as fast as most people speak (this is natural speed, without trying to speed-read). It's also one of the main reasons I stumble over words when speaking - they get jumbled together because I can't actually speak that fast. So I don't find that unbelievable at all.