Hacker News new | ask | show | jobs
by somebodynew 1588 days ago
It looks like the principle is that a machine learning model trained on the combined output of four different kinds of gas sensors can discover correlations between unintentional characteristics of the sensors. For example, the manufacturer of an ethanol or nitrogen dioxide sensor is not going to specify anything about how it responds to vanillin, but it seems plausible to me that the relationship between their responses contains some hidden information that could help to discriminate between vanillin and eugenol. With enough different sensors, there's quite a bit of information to be found in mining their undefined behavior.

That is to say, you can treat the sensor reading as being completely meaningless and skip interpreting it as indicating VOC levels. You're just using the sensors as black boxes that produce arbitrary values with the property that exposure to organic vapor changes the output "somehow", and letting model training find some meaning in it.

2 comments

> With enough different sensors, there's quite a bit of information to be found in mining their undefined behavior.

It sounds like you would need to be exceptionally careful that your meta-process didn't "find" some signal in pure noise (via re-using test sets and so on).

> It sounds like you would need to be exceptionally careful that your meta-process didn't "find" some signal in pure noise (via re-using test sets and so on).

It sounds like you’re actually talking about ordinary levels of carefulness in this (ML) context.

That would be great. I'm no ML expert, but my impression was that standards varied widely from team to team.
Does this mean that each sensor cluster has to be trained independently?
When this technique is performing at its best, I would expect so. The old story of the evolved FPGA comes to mind: https://www.damninteresting.com/on-the-origin-of-circuits/

You're intentionally depending on the "personality" of each gas sensor to get data measuring unknown features, so you can't expect consistency from sample to sample. Anything that was completely portable between different sensors would inherently be less powerful.

Most high-accuracy systems incorporate an onboard calibration target of some kind. Could be a gas cell (either sealed or consumable) or a special lamp etc. Or you buy an instrument that comes with calibration coefficients from the manufacturer. For example if you sell spectrometers, you put in the grating and manually adjust it for the desired range. This is the case for cheaper instruments (eg Ocean Optics) as well as expensive bespoke systems which are all hand built. Even if the grating and mirror mounts are fixed, the tolerance in manufacturing is rarely good enough that calibration isn't required. It's way cheaper to do some relatively low accuracy machining and then just epoxy all the screws down.

In this case you'd probably calibrate each sensor to a standard chemical sample and then use the calibration output. You could train on that, not the raw samples and then you have a model that works on all devices.