| HN Mirror

I agree. As of now, the extra dimensions are redundant.

However they are still somewhat useful. By using distance functions I localize sound sources in space but I also turn them into symbols. This helps me keep track of them and I plan on building a set of rules (grammar) for how these symbols can interact.

Now, building the semantically relevant 'spatial symbols' from data is where the real challenge is, and the first step is to actually gather such data. Unfortunately I don't have access to a photogrammetry setup so all I can do is wait for companies/research institutes to make appropriate data accessible. The alternative is to generate the data synthetically, but you hit a procedural audio generation wall.

>lifting with a kernel

I am not familiar enough with the lifting trick to know whether or not it is relevant to this context, which is that of 'embodying' sounds, and not of classifying them. I think it would be silly to think the 3d space would be sufficient to perform sound source separation and/or music transcription. If I were to add those features I would definitely use existing models(neural nets), which properly leverage much higher dimensional spaces.