Hacker News new | ask | show | jobs
by psobot 1742 days ago
Wow, didn't expect this to hit HN! I'm the author of this project and super glad that it's getting some traction.

Under the hood, this is essentially just a Python wrapper around JUCE (https://juce.com), a comprehensive C++ library for building audio applications. We at Spotify needed a Python library that could load VSTs and process audio extremely quickly for machine learning research, but all of the popular solutions we found either shelled out to command line tools like sox/ffmpeg, or had non-thread-safe bindings to C libraries. Pedalboard was built for speed and stability first, but turned out to be useful in a lot of other contexts as well.

7 comments

This is great, congrats and thank you (& Spotify) for releasing this!

I was just about to look for a library to layer 2 tracks (a text-to-speech "voice" track, and a background music track) and add compression to the resulting audio.

A few questions if you don't mind:

- Pedalboard seems more suited to process one layer at a time, correct? I would be doing muxing/layering (i.e. automating the gain of each layer) elsewhere?

- Do you have a Python library recommendation to mux and add silence in audio files/objects? pydub seems to be ffmpeg-based. Is that a better option than a pure-Python implementation such as SoundFile?

Thanks

Thanks!

That's correct: Pedalboard just adds effects to audio, but doesn't have any notion of layers (or multiple tracks, etc). It uses the Numpy/Librosa/pysoundfile convention of representing audio as floating-point Numpy arrays.

Mixing two tracks together could be done pretty easily by loading the audio into memory (e.g.: with soundfile.read), adding the signals together (`track_a * 0.5 + track_b * 0.5`), then writing the result back out again.

Adding silence or changing the relative timings of the tracks is a bit more complex, but not by much: the hardest part might be figuring out how long your output file needs to be, then figuring out the offsets to use in each buffer (i.e.: `output[start:end] += gain * track_a[:end - start]`).

Makes sense, so I'd be doing everything at the sample-level

For layers, I could have an array that represents "gain automation" for each layer, and then let numpy do `track_a * gain_a + track_b * (1-gain_a)` for the whole output in one go.

And I'd create silences by inserting 0's (and making sure that I'm inserting them after a zero crossing point to avoid clicks)

I'm prone to NIH :-) but I'll also try to see if something like this exists. But at least -- it's clearly do-able/prototype-able!

Thank you

Out of curiosity, did you use any of the code produced by Echo Nest? They were a Boston audio tech company that had lots of features like this, but they got swallowed by Spotify many years ago. I built some tools on top of their service, I always wondered what happened to it.
No Echo Nest code was included in this project specifically, but my team owns a lot of the old Echo Nest systems, data, and audio magic (i.e.: what used to be the Remix API, audio features, audio analysis, etc.). Pedalboard is being used to continue a lot of the audio intelligence research that started way back with the Echo Nest!

(Fun fact: the Echo Nest's Remix API was what got me interested in writing code way back in high school. Now, more than a decade later, I'm the tech lead for the team that owns what's left of it. I still can't believe that sometimes.)

I've done a lot of stuff with the Audio Analysis API, and it's horribly underdocumented. I tracked down the phd thesis that formed the basis of that array some time ago[0], but it's pretty theoretical and likely outdated. Do you think you'll ever get around to actually documenting that API?

[0] https://web.media.mit.edu/~tristan/phd/dissertation/index.ht...

Wow, the journey is inspiring. thanks for sharing.
What a wonderful answer! :) Thanks for sharing!
Slightly off-topic, but is there a good overview of machine learning research being done at Spotify?
There is! Check out http://research.atspotify.com/.
This is very interesting! I write VSTs as a hobby (usually in JUCE). Does this make it easier for load them into Python, or to create new ones, or...? How do you envision this being used?
Are you using python to do realtime audio processing, or is this all offline ("batch") processing? It wasn't entirely clear from reading the blurb ...
We use Pedalboard (and Python) for offline/batch processing - mostly ML model training.

Pedalboard would also be usable in situations that are tolerant of high latency and jitter, though, given that all audio gets handed back to Python (which is both garbage collected and has a global interpreter lock) after processing is complete.

Hey, did you consider releasing a wrapper for VST instruments?

There's definitely a lack of cross platform VST host (without the need to use a DAW).

Also can Pedalboard support VST GUIs?

Instruments wouldn't be that hard to add to Pedalboard, but we don't have a use case for them on my team just yet. I might give that a try in the future, or might let someone else in the community contribute that.

Pedalboard doesn't support GUIs at the moment, but there's an issue on GitHub to track that: https://github.com/spotify/pedalboard/issues/8

A promising answer, thanks :)
Do you hire REMOTE?