The media lab has done a ton of research on this. I seem to remember people being able to notice visual latency at 30ms and audio latency at 80-120ms (this is because light is faster than sound).
Some games (e.g. llsif, for android) have "perfect" window sized to 16ms (a video frame). Even with latency compensation, these are unplayable on bluetooth yet fine on headphone jack. As the game has calibration, the resulting offset is seen to be at least 30ms worse on bluetooth.
Interesting, would love to read more if specific papers/authors come to your mind. I suspect there's a big gap between e.g. "noticing the audio latency when audio is played as a result of pressing a button" vs "audio latency affecting the flow of a multiparty conversation".
it's probably the latter, because the former is about 5ms (which is equivalent to the statement, "how short of a time between sounds are they perceivable as separate" aka the lower frequency threshold of hearing). It's non obvious that they're the same limit.
Any rhythm game player will disagree.
Some games (e.g. llsif, for android) have "perfect" window sized to 16ms (a video frame). Even with latency compensation, these are unplayable on bluetooth yet fine on headphone jack. As the game has calibration, the resulting offset is seen to be at least 30ms worse on bluetooth.