I can deal with 10ms roundtrip, but not much more.
One issue is when you get some acoustic sound too (open headphones, bone conductance, singing) and pretty much any delay in the headphone signal causes phase interference.
With a virtual instrument you would usually not face the full round trip latency, but rather control latency (MIDI?) and audio output latency (usually 50% of roundtrip). So in the VSTi case that might be closer to 25ms than 40ms in practice. Often the latency jitter is quite dreadful in this case as the control signal is not synchronous to audio.
Did you ever see people who try to sing with headphones with a delay of around 200ms? That really messes up your performance.