Hacker News new | ask | show | jobs
by jumpocelot 133 days ago
Congratulations on the new release! I've seen some forum discussions on this in the past, and I'd imagine it's a frequently debated topic. However, I'd like to ask about the technical feasibility of implementing a feature similar to Ableton's 'Warp' within Ardour. I understand that Ardour and Ableton have fundamentally different architectures and that different DAWs can prioritize different workflows. Given the current state of the codebase and the development roadmap, I'm curious how realistic the implementation of BPM-synced time-stretching actually is or if it remains significantly outside the project's scope.
1 comments

The biggest issue here is that the best library for doing audio warping (ZPlane) is not available to us. We already do realtime audio warping for clip playback, just like Ableton, using RubberBand (and might consider using Staffpad at some point, which we have available for static stretches).

However, following the tempo map is a very different challenge than following user-directed edits between warp markers, and neither RubberBand nor Staffpad really offer a good API for this.

In addition, the GUI side of this poses a lot of questions: do you regenerate waveforms on the fly to be accurate, or just use a GUI-only scaling of an existing waveform, to display things during the editing operation.

We would certainly like to do this, and have a pretty good idea of how to do it. The devil, as usual, is in the details, and there are rather a lot of them.

There's also the detail that having clips be bpm-synced addresses somewhere between 50% and 90% of user needs for audio warping, which reduces the priority for doing the human-edited workflow.

>do you regenerate waveforms on the fly to be accurate, or just use a GUI-only scaling of an existing waveform, to display things during the editing operation

just use GUI scaling, and only IF the prior is too challenging

You often want sample accurate waveform visualization when tuning samples that are time or pitch warped to set start and loop points at zero crossings to avoid clicks without needing fades.
Overwhelmingly, there's no such thing as a zero crossing. Your closest real world case is a point in time (between samples) where the previous sample is positive and next one is negative (or vice versa). However, by truncating the next sample to zero, you create distortion (and if the absolute value of the preceding sample is large, very significant distortion.

Zero crossings were an early myth in digital audio promulgated by people who didn't know enough.

Fades are always the best solution in terms of limiting distortion (though even then, they can fail in pathological situations).

There's definitely such thing as a zero crossing, it's where sign(x[n-1]) != sign(x[n]) (or rather, there's "no such thing as a zero crossing" in the same way there's no such thing as a peak). Picking a suitable `n` as a start/end point for sample editing is a judgement call, because what you're trying to minimize is the difference between two samples since it's conceptually a unit impulse in the sequence.

I don't think people who talk about zero crossings were totally misguided. It's a legitimate technique for picking start/end points of your samples and tracks. Even as a first step before BLEP or fades.

Theoretically, it makes sense (go look at any of the diagrams of what a "zero crossing" is online, and it totally does.

The problem is that sign(x[n-1]) != sign(x[n]) describes a place where two successive samples differ in sign, but no sample is actually has a value of zero. Thus, to perform an edit there, if your goal is to avoid a click by truncating with a non-zero sample value, you need to add/assign a value of zero to a sample. This introduces distortion - you are artifically changing the shape of the waveform, which implies the introduction of all kinds of frequency artifacts.

Zero crossings are not computed by finding a minimum between two consecutive samples - that would almost never involve a sign change. And if they are computed by finding the minimum between two consecutive samples that also involves a sign change, there's a very good chance that you'll be long way from your desired cut point, even if you ignore the distortion issue.

It really was a completely misguided idea. If the situation was:

     sign(x[n-2) != sign(x[n]) && x[n-1] == 0
then it would be great. But this essentially never happens in real audio.
It's not as if a constantly changing single-axis non-linear transform is trivial to accomplish in the GUI either :(