Hacker News new | ask | show | jobs
by gregsadetsky 1742 days ago
This is great, congrats and thank you (& Spotify) for releasing this!

I was just about to look for a library to layer 2 tracks (a text-to-speech "voice" track, and a background music track) and add compression to the resulting audio.

A few questions if you don't mind:

- Pedalboard seems more suited to process one layer at a time, correct? I would be doing muxing/layering (i.e. automating the gain of each layer) elsewhere?

- Do you have a Python library recommendation to mux and add silence in audio files/objects? pydub seems to be ffmpeg-based. Is that a better option than a pure-Python implementation such as SoundFile?

Thanks

1 comments

Thanks!

That's correct: Pedalboard just adds effects to audio, but doesn't have any notion of layers (or multiple tracks, etc). It uses the Numpy/Librosa/pysoundfile convention of representing audio as floating-point Numpy arrays.

Mixing two tracks together could be done pretty easily by loading the audio into memory (e.g.: with soundfile.read), adding the signals together (`track_a * 0.5 + track_b * 0.5`), then writing the result back out again.

Adding silence or changing the relative timings of the tracks is a bit more complex, but not by much: the hardest part might be figuring out how long your output file needs to be, then figuring out the offsets to use in each buffer (i.e.: `output[start:end] += gain * track_a[:end - start]`).

Makes sense, so I'd be doing everything at the sample-level

For layers, I could have an array that represents "gain automation" for each layer, and then let numpy do `track_a * gain_a + track_b * (1-gain_a)` for the whole output in one go.

And I'd create silences by inserting 0's (and making sure that I'm inserting them after a zero crossing point to avoid clicks)

I'm prone to NIH :-) but I'll also try to see if something like this exists. But at least -- it's clearly do-able/prototype-able!

Thank you