Hacker News new | ask | show | jobs
by newqer 1278 days ago
My guess would be it fixates on the most dominant source available and mutes the other factors. It probably favors human voices over other ambient noise, therefore singeing the man out.

It will really get freaky when there an ambient noise resembling a human voice. I'm thinking the Bear scene from the movie Annihilation.

1 comments

One should take a STT transcription on the raw and modified media streams and do a diff to find unintended modifications.