| HN Mirror

The phenomenon described by the quoted comment is called "temporal masking". There is "pre-masking", where a sound is rendered in-perceivable by a sound that _follows_ it (your "forgetting" case). And there is post-masking, where a sound is in-perceivable because of a masking sound that preceded it. And yes, this is due to inherent slowness / lack of temporal resolution in the auditory system.

Temporal masking widely exploited in all kinds of lossy audio compression (MP3, AAC etc), to remove the data that cannot be perceived anyway.