Hacker News new | ask | show | jobs
by snozolli 599 days ago
I'm curious why this is even an issue. I don't understand why an actual interrupt would get tripped for virtual serial port writes, and I don't understand why a virtual serial port (i.e. logging) gets swamped by what seems like a moderate stream of data.

The first result for "8250 too much work for irq4" is this: https://unix.stackexchange.com/questions/387600/understandin... (2017)

The problem is that the UART hardware that is emulated by various brands of virtual machine behaves impossibly, sending characters at an impossibly fast line speed. To the kernel, this is indistinguishable from faulty real UART hardware that is continually raising an interrupt for an empty output buffer/full input buffer. (Such faulty real hardwares exist, and you will find embedded Linux people also discussing this problem here and there.) The kernel pushes the data out/pulls the data in, and the UART is immediately raising an interrupt saying that it is ready for more.

So, is this seven year old problem still a problem, or is the virtual serial port driver actually unable to keep up with the stream of text?

2 comments

> I don't understand why an actual interrupt would get tripped for virtual serial port writes, and I don't understand why a virtual serial port (i.e. logging) gets swamped by what seems like a moderate stream of data.

If you're running in a VM, how do you know the difference between a virtual interrupt and a actual interrupt? Either way, you're handling an interrupt.

If I understand the issue, it's not that the virtual serial port is swamped really. It seems like the issue is that the virtual serial port is firing its empty/ready interrupt so often (probably after every byte written to the port?) that the driver thinks the interrupt is broken, so then it falls back to polling for readiness. The driver polling rate seems like it's not high enough to keep up with the logging, so then logging blocks.

Probably, it'd be better for the virtual serial port to limit the number of irqs it will send. Limiting the number of interrupts is a key benefit of the 16550 over earlier UART chips; it does this by having a 16 byte buffer and sending interrupts only when the buffer is empty/below a threshold (for outgoing) or full/above a threshold (for incoming). Getting an interrupt for each outbound byte is too many interrupts if you're logging a lot of junk on the serial console (which I'm guessing happens; modern software is full of junky logs IMHO).

Probably the right thing to do would be for virtual machine hosts to over virtio consoles as well as virtual serial ports, and for virtual machine guests to prefer to use a virtualio console rather than a serial port for logging. But it's probably possible to adjust the serial driver options as well. Virtio console could be much more efficient, as it allows transfers larger than a single byte at a time.

Too late for edit... But rereading the blog and references... I suspect if this was on real hardware, with a real 16550 UART, you'd have the same issue, but without the message about too many interrupts.

You'd just have your logs backing up because you're writing more than 115kbaud, and then things that write logs become blocking. You'd need to figure that out by seeing what processes are blocked on what, rather than getting a hint because the irq behavior is weird.

It's not clear from the thread whether the logging to the serial port was in the same thread as the audio processing. If it was, then it would be a problem either way. If it wasn't, then the fact that the irq is not premptable would mean that it will cause problems for all threads in the virtual case, but not in the real hardware case.
Even on a virtual machine, I'd expect the serial interrupt to only be routed to one vCPU; that shouldn't prevent execution on other vCPUs right? (especially if you've got more than 2)... but if your tasks happen to be cpu pinned to the cpu that's bogged down with serial interrupts...

Something message at a time would be so much more useful than byte at a time with outb (does the Linux driver do rep outb... would that actually help?)

True, having multiple CPUs should help. Though in my experience it's quite easy in linux for one CPU stalled in the kernel to block quite a wide range of other tasks, presumably because of contended locks.

And yes, the better approach is to use a virtio type system which is designed for VM to host communication, but VM implementations tend to emulate lowest-common-denominator hardware for the highest compatibility (The 8250 is everywhere, basically every OS supports it and its interface is very commonly emulated even in modern hardware). The issue is, because it's ancient hardware, every tiny buffer is basically a full interrupt routine (checking flags and copying data), with multiple traps to the VM emulation of it. Doing something like rep outb would be only a minor optimisation on top of that (it already does the most significant thing of continuing to copy data in the interrupt routine until the flag is unset, instead of waiting for the interrupt to re-trigger for each chunk of data).

(On real hardware, doing something like DMA is the much better way to optimise this, but it's probably not a good emulation target because DMA controllers vary wildly from platform to platform)

Curiously, I can't find this message in recent kernels, it was patched out in 2018: https://github.com/torvalds/linux/commit/9d7c249a1ef9bf0d569... . Is it being re-added by distros or something?

In terms of how it causes the issue, I think it's more that the serial driver will monopolise the (virtual) CPU for as long as it is receiving or sending text. If there's a large buffer being dumped all at once (a common default for non-interactive streams), it could stall for long enough to cause an underrun, which doesn't need to be very long, even if the average data rate isn't very high. Consider that the virtual 8250 serial port interface isn't exactly the most efficient interface to begin with, since it's at most copying 16 bytes from host to guest at a time