Hacker News new | ask | show | jobs
by Buttons840 2066 days ago
Has anyone ever experimented with reversing the analog loop with machine learning?

Generating the training data would be easy. Play audio files through a speaker and record the speaker with a microphone. Then you have the original and the analog recording. Train the machine to reverse the process. Now you can grab high quality audio from any source. Same could be done for video, but would require a more elaborate setup.

Kind of off topic, but I hate the idea of not being able to own any actual copies of media in the future.

Also, I've made it part of my moral code, nearly a religion of mine, to see all advertisements as a reminder I maybe should be doing something else. Since this is my rant, I'll add that to the pile.

5 comments

> Generating the training data would be easy. Play audio files through a speaker and record the speaker with a microphone. Then you have the original and the analog recording. Train the machine to reverse the process. Now you can grab high quality audio from any source.

Machine learning isn't this magic thing that can defeat math. The digital to analog process is lossy, and you're asking machine learning to reverse that lossy process. This is very similar to saying "If I train my machine learning model that 10 + 10 = 20, and 3 * 40 = 120, if I ask it x + y = 42 it should be able to tell me x and y". It's obviously impossible, mathematically so.

You could train it to reduce some predictable sources of analog interference, sure, but you absolutely can't "grab high quality audio from any source".

Well, actually, there is an exception! For data you train on, assuming you train the neural net enough, it will eventually encode a fingerprint of input audio and an output of the original digital training data, at which point your neural net has encoded copyrighted training data... effectively it's a form of compression (or maybe just obfuscation) now, and contains copyrighted material. Oops.

Disregarding the fact that this process would be mathematically impossible, let's say you do produce this program. So, does it circumvent copyright law? Does it actually improve anything?

No, it turns out what you have is even more obviously a circumvention device than youtube-dl or anything else. Under the DMCAA, the neural network would be an illegal copyright circumvention device, and all the audio it produces would not be legal copies, and could not legally be owned. I'll reference "what color are your bits"[0], since it's an excellent description of one of the problems here. You're proposing a technical solution, but there is no technical solution here, it's a legal problem. Whether you torrent an album or whether you reverse an analog source through magic, either way the bits are colored with copyright infringement.

[0]: https://ansuz.sooke.bc.ca/entry/23

Great write-up.

>you're proposing a technical solution, but there is no technical solution here, it's a legal problem.

Yep. P2P community tried to play this game and lost every single time. The Bitcoin and crypto-currency are trying to play this game as well when the advocates propose that bitcoin is a technical workaround to things like AML regulations - and they will lose as well.

> what you have is even more obviously a circumvention device

The device itself is just an "audio enhancer" - it could just as easily be used to spruce up the sound on old vinyl (pre-vinyl even) as it could to defeat copyright law.

The real issue with using it as OP intends is it's wholly unnecessary: the value in tools like youtube-dl is being asynchronous - you get the audio file without listening to the entire song first. OP's suggestion involving a speaker and a microphone is intrinsically synchronous, and it's a lot more work for lower quality than e.g. using Audacity to record the lossless digital audio off your sound bus.

It makes far more sense to grab a line-out feed rather than using a speaker and a microphone. Otherwise the positioning of the microphone and the exact voice coils of the speaker and microphone would all have a dramatic effect on the EQ curve of the audio, to say nothing of stereo or the ensuing phase issues that crop up when recording audio in real life
Why even use the line-out? You can just plug it into a recording audio driver.

Video is harder, mostly because of the bandwidth.

Line-out guarantees that even if there's DRM in the streaming decoder, you're getting the actual audio stripped of all DRM.

It has the simple elegance of writing your DRM protected audio to a CD and then ripping that to MP3, which is a process most non-audio nerds already understand from the DRM iTunes days.

That said, I agree that you could get a slightly more accurate sound with a virtual audio pipe than if you use a DAC/ADC combo, even if modern jitter is negligible

But as long as the EQ effect of the speaker/microphone setup is constant, an AI should have no problem learning how to restore the original.
Not in the real world where audio bandwidth is limited.
That's one big if.
Seriously, the tone can change from just a couple millimeters of difference in microphone or speaker positioning, not to mention the possible variations in room sound due to furniture or other object placement.
> to see all advertisements

I suggest to also click on all possible advertisements then not purchase anything (just ignore whatever tab was opened by your click).

If millions of people did this daily, it would break the web advertising model very fast since advertisers pay per click.

I don’t know what would replace it, but maybe something better than what we have now. Maybe not.

You might be interested in AdNauseam, a browser extension that automates this: https://adnauseam.io/
As someone deeply integrated into the ad serving world, I'd like to remind you that adnauseam traffic is very easy to filter out as noise. Nobody is paying for that traffic.
If someone using adnauseam accidentally clicks an ad (for real), can the ad network count the real click while ignoring the fake ones, or does it poison that users full dataset?

What percentage of adnauseam use do you think is filtered? Is it a full 100%?

Any ad network not considering that traffic as fraudulent will shortly go out of business. That's why I say it declaratively.
How could it be made to work better?
Don't provide outlier data. What is your goal?
Since the losses in the analog conversion process cannot be determined exactly, the model is bound to add some noise to the converted audio. Video has more spatial data to guess the color and motion so it's easier in practice.

The unconverted sound may be crisper and has more details but, there's no guarantee that they're the original details so, it won't be the original recording itself.

Light pollution adds noise to telescopes too. From a sky where you can barely see 4 stars you can pull detailed colored images of deep space objects.

Perhaps you have to play the media a few dozen times and do the media equivalent of frame stacking to see through the noise.

It's also quite possible no ML would be needed. I don't think frame stacking uses ML.

I wouldn't be surprised if you could play a song on repeat from the other side of your house and extract a very good copy of it, so long as you knew exactly when the song began and looped. You might only need to know the length of the song, not even when it began.

It might not be practical, but it would be a cool blog post.

It's not the same thing. A CMOS sensor, especially a cooled, astro-class CMOS sensor is much more sensitive than eye.

The random noise in photography is emitted from the sensor itself and has a distinct profile. This profile can be extracted with certain procedures and can be used as a single pass NR process with very high quality results (Darktable's Profiled NR does this if the camera's profile is generated/bundled). Also Subtractive NR does something similar. After the exposure, a closed curtain exposure is taken with same shutter value and that image is subtracted from the first image. Since it's a per-sensor process, its quality is very high too.

Light pollution is also somewhat similar. It's a specific wavelength, emitted from ground to sky (so its gradient can be known) and can be filtered out relatively easily with stacking and other RAW processing (assuming your image has enough bit-depth and headroom in both highlights and shadows). There are also new filters which directly filter this kind of pollution IIRC.

Stacking does something similar. Pixels with high consistency is kept, low consistency is discarded so you get a clean image. Sorry, I don't know its exact math since I don't have a tracker and don't take many astro photos.

However, in an analog recording you have incomplete information and you want to put it back via ML, which is basically a very educated guess in this case. A well tuned and trained ML model would probably put back sensible or semi-sensible details back but, it cannot guess and re-generate the missing parts with 100% accuracy.

So at the end of the day, in photography, you have the ability to get the complete information (via stacking or subtractive NR or by cooling the sensor a great deal) however, in an analog recording, you don't have the complete information. Especially if you record it via a speaker to microphone path (since they're not ideal reproducers).

We may go to the sounding characteristics of analog audio pipelines and vinyls from there but, that's another rabbit hole I'd rather not dive now.

That is a thoughtful and useful reply. Thank you.

From what you say, stacking can remove both random noise from the sensor and predictable noise such as light pollution. What other kinds of noise are there? It sounds like our noise removal ability is pretty good.

I am not an expert though. I do know we can't image the Apollo landing sites no matter how good our stacking software is. Our sensors aren't good (big) enough. I don't have an understand of why that is though.

An analog loop would have hard limitations, just like a telescope. I'm not sure how much noise stacking could clean up. At this point I'm more curious than thinking it's a good solution.

You can use a virtual audio cable to simply record anything that would go to your speakers, it's free.