I assume the author of the library works under a hard real-time constraint. Under such circumstances (an example would be low latency audio) you can not tolerate the latency impact of a sporadic syscall.
Perhaps, but that is almost the opposite of what they said: in hard real time you can tolerate a longer average latency in return for needing a shorter maximum latency. That matches from what I would expect from a lock free data structure. But that doesn't match the (dubious) claim that locks are usually slower.
Lock free doesn't solve this. One thread will always make progress, but you have no way to ensure it is your thread so you can miss a deadline with lock free. When the data is under a lot of contention across many cores this is an issue (most of us don't have hundreds of cores so we don't see this).
Generally lock-free is better for these situations as odds are when you hold a lock at least some CPU cycles are used for something that isn't directly modifying the data that needs the lock - those cycles the other CPU can touch it when lock-free. (note that when using a lock there is a trade-off, often it is better to hold the lock for longer than needed instead of dropping, doing a couple operations and then locking again)
If you must make progress locally (not just globally) the guarantee you need is wait freedom which is even stronger than lock freedom.
(All wait free algorithms are lock free because necessarily if every thread is guaranteed to eventually make progress then overall progress is definitely made)