I assume you mean by auditory information. Echolocation pertains specifically to the use of echoes to determine location, which would not help at all in this case, with the granularity required.
Even assuming that auditory information it is, I am unsure if it enables mimicry as much as visual information, which apart from being our predominant sense, it gives clear clue as to which movement led to which action/sound
Even assuming that auditory information it is, I am unsure if it enables mimicry as much as visual information, which apart from being our predominant sense, it gives clear clue as to which movement led to which action/sound