Hacker News new | ask | show | jobs
by jkadlec 2355 days ago
Some good basic info, but at the same time there are some inaccuracies. WAV is not a lossless format, it's a container, it can contain any compressed audio format, even mp3. You can have PCM inside WAV, which is indeed lossless, but you're not going to see that in the wild too often. Going with 16k is also questionable, since most readily available pre-existing datasets, were recorded in 8k (which is what telephony codecs mostly use).
1 comments

WAV is almost always lossless with PCM data. I'm not sure where you got the impression that "you don't see that in the wild too often". Depending on what kind of analysis you need to having your audio at 8k is going to deem any results useless. I would have it minimum 16k and aim for 44.1k in order to preserve the top end which is where a large quantity of useful information is. The reason most sets are recorded in 8khz is that they are running MFCC's which are quite stubborn and insensitive to the high end anyway with most enough information for machine learning existing in the bottom end. If you're doing music, or environmental sounds you really need to preserve the other frequency bands.