| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by knaik94 1322 days ago

I am not sure sure what is causing it, but it takes a solid 2 to 3 minutes on my computer before it does anything. I load a file and it feels like it freezes and firefox gave me a warning banner that the tab was causing all of firefox to slow down. Same thing is Chrome and I have a i7-10750h. Some people might mistake that for it not working since there is not UI feedback of anything happening. Windows 10.

I got two different tracks to work, and it's clear that one was a lot harder to process than the other. It took noticeably more time on the second one, to start and the CPU utilization was higher as well. They were both instrumental tracks in the same format and around the same length. The one simpler to process was the instrumental of Britney Spear's Baby One More Time. The harder one was Porter Robinson's Divinity.

Neither audio had an effect similar to the one from the demo video, but were interesting regardless. They both looked like how I imagine sound waves echo and bounce around if contained in a cube shape.

I appreciate the notebook writeup where you described the goals because the visualization wasn't inherently intuitive with the sound. I chose much more complex tones than your demo. I imagine the feature extraction is much easier on isolated sounds. This reminds me a lot of project milkdrop and so I was expecting it to be closer to that but in 3d. That was probably a misunderstanding on my part of the goals for this.

I think exposing more parameters about how features get mapped and scaled would be really helpful in making it feel more intuitive. Zooming the cube in and out is nice but didn't seem to help convey more information with the tracks I chose. If anything it got in the way because on my computer the zoom sensitivity was very very high.

I look forward to seeing where this goes.

1 comments

rslice 1322 days ago

Thank you for your thoughtful comment! This is by no means a product, it's more of a way for me to test an idea and share it with the community.

This demo is meant for small audio samples. My initial goal was to use it to visualize and compare drum samples by looking at their 'spatial signature'.

Right now, I'm using arbitrarily defined 'shapes' (sphere and tube) but the goal is to recover those from real-world data. Unfortunately, building the appropriate data-set and the model to go with it is currently out of my reach but I know that's how my brain sort of learned these audio/spatial associations.

In order for this to work on complete tracks, I would need to add source separation, transcription and some form of information compression.

Additionally, I'm working on ways to deal with richer sounds, by laying them out in space or by splitting them into 'voices of unison'. Here's a demo of what it would look like: https://imgur.com/gallery/gkFPXXu

There are two directions I could take this, either making music from interacting with spatial representations or building spatial representations from music. I don't have the bandwidth to do both alone so I would love to work on it with other developers.