Hacker News new | ask | show | jobs
by mrtnmcc 1588 days ago
In my experience, "AI can extract more information from sensors" is mostly a myth.

An example is the SCIO sensor ( https://nocamels.com/2019/03/scio-kickstarter-darling-promis... ) which was a cheap handheld spectrometer that claimed to accurately determine the nutritional information of any food you pointed it at.

One good way to debunk this is to measure raw sensor output and compute Mutual Information (which incorporates sensor noise/variability). If the sensor only produces X bits of information, no algorithm will be able to extract more classes than that. In the SCIO case it was just under 8 bits total of information. So something like a poor color sensor. You could train on apples and oranges and maybe do an investor demo, but it's not actually going to do anything useful (as the Kickstarter crowd soon learned).

5 comments

True, but there are things where AI can help. For example, in the domain of electronic gas sensors, AI can be used to disentangle confounding variables like gas, humidity and temperature. All three affect the sensor output in a nonlinear fashion, and an ANN can learn the transfer function that extracts the (almost) pure gas response.
Yes combining relatively independent sensors will increase the MI.
The sensors are not independent.

Gas sensing is really tricky. Metal oxide gas sensors respond nonlinearly to all three of gas, temperature, and humidity. Plus they drift. AI can help with the nonlinear response. Drift hasn't been solved yet, as far as I know.

Understood,the point was if the sensors are identical, they don't give any more information, some independence is needed.
Is the limit: A) sensor resolution, B) NN architecture and/or algorithm, C) training sample size, D) training data (labeling, segmentation) quality, or E) it doesn't sufficiently predict the variance with low enough error?

New NN models are able to do more with the exact same sensor data.

You cannot conjure information out of thin air. Even with infinite data and a hypothetical wormhole CPU that runs everything in O(1) and solves the halting problem, you still couldn't do this. So to answer your question, the reason is effectively (A). Sensor resolution might be the wrong term but it's the general idea.
How much information content is there in DNA (and RNA,)? How do creatures know or learn what not to eat given limited available sensor data?
How much information content is there in DNA? 2 bits per base, before compression. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

How do creatures know what to eat? Evolution solved that for most creatures, so their sensors don't have to work as hard at runtime. And in other cases, some number of members of a population of creatures will die before the population learns the food is poisonous. Our sensors, and the information processing systems that manage their outputs, are remarkably efficient data processing engines that do the equivalent of approximating and predicting, often well beyond what the most advanced deep learning systems are capable of doing now.

So, sensor resolution is higher, there are multiple fields being integrated, in a massively-parallel spreading-activation Biological Neural Network, and that's how blank-slate creatures just know?

Is there enough information content - per the Shannon entropy definition or otherwise - in DNA and/or RNA to code for the survival-selected traits that

I'm not sure that the (Shannon entropy, MIC, Kolmogorov,) information content of the samples is the limit of any given network trained therefrom? Is there anything to be gained from upsampling and adding e.g. gaussian blur (noise)? Maybe it's feature engineering, maybe it's expert methods bias, maybe it's just sensor fusion; that's the magic noise.

Perhaps this is moving the goalposts a bit, but e.g. depixelation does appear to defy such a presumed limit due to apparent information content? Perhaps it is that the network reading the sensor carries additional information associated with the lower-resolution or additional-fields' sensor data?

https://github.com/krantirk/Self-Supervised-photo :

> Given a low-resolution input image, PULSE searches the outputs of a generative model (here, StyleGAN) for high-resolution images that are perceptually realistic and downscale correctly.

Maybe no amount of feature engineering can actually add information?

Because they receive additional information from the environment through highly sensitive sensors producing massive amounts of information. Whereas the information you get from a cheap sensor effectively discretizes to a few bits.
You can, but it's called making stuff up
Having designed sensor systems, I've lost more than a few hours of my life having to explain "why do we need that big expensive sensor when you can do everything with machine learning?"

The idea that a magic math technique can replace expensive sensors predates NN's by a few decades. Dozens of start-ups have gone bankrupt trying to do non-invasive blood glucose with portable sensors.

This is a very crude but at least conceptually useful rule of thumb: It's all of the above, but ultimately the analysis result is a mathematical function of an array of values produced by the sensor. Very few math functions do not have the property, that variation in the output increases with the level of variation in the input.

AI can extract information from a sensor that is 'obvious' when you look at it by eye, yet no easy combination of frequency filters and a carefully tuned threshold can extract reliably.
AI can detect more information in the whole dataset, because it for example has the whole "breath in- breath out" cycle in view. Fungi residing in the mouth would be present as background noise even during breathing in and out. But fungi-products existing at the end of a breath out cycle, are most likely to originate from the lungs, due to the mouth contamination being "flushed" out by the breath itself.
Priors can make sensor information more useful maybe, but that is just knowledge that helps first limit possibilities before taking a measurement. Priors also work against you when you are trying to sense something novel that might indicate a thing you don't expect.

An aside on sparsity priors (which that article uses).. reality is actually a lot less sparse than the researcher models would have you believe. If most dimensions are not truly zero (e.g., have some small noise present) these sparsity methods fall apart. That's why you (never?) see the methods deployed in actual products.

Specifically, the support determination step usually breaks down in epsilon sparse and you also get "noise folding".