It’s not clear why they don’t just use a finer-grained timestamp component (e.g. nanoseconds) and increase that number if two events are within the same clock interval, and keep the random component random.
The underlying clock doesn’t need to be fine-grained; only the target representation needs to be fine-grained enough for the additional increments inserted by the code that generates the IDs. Effectively, the precision of the timestamp part needs to be large enough to support the number of IDs per second you want to be able to generate sustainably.
Also, the spec notes there is an entropy trade off in using more bits for the time stamp. More timestamp bits is fewer random bits, because ULIDs constrained themselves to a combined total 128 bits (to match GUID/UUID width).