The rust version that does exactly what python does (buffered output) is an order of magnitude faster (even if I force the rust buffer size to be 4KiB like with python2).
To head off this comment chain, I've softened the language in my original comment to "overwhelmed by" rather than implying that non-I/O factors are wholly irrelevant. :)