The reason join is fast for these cases is because join basically converts all iterables to a list first and figures out how much it has to join exactly.
It walks the iterables (no need to convert them to lists), but its main trick is actually knowing how much memory it'll need for the end result, so no re-allocations happen, simply a one pass (through multiple iterables) copy.
No, it really converts all iterables to lists (there is a utility function in the C API to do just that, because it's kinda hairy to do by hand. PySequence_ToList or so). Otherwise it would not be possible to calculate how much space is required beforehand, since iterables cannot be walked twice.