Sorry if this is a stupid question, but can't you do this without converting the image into an RGB 3D space? (i.e. iterate through the pixels and count the ones within a certain range of what you want)
You don't have to actually reify the conversion (as in, create a data structure that holds the new representation in memory). When they say that they converted the image into RGB 3D space, it was probably mainly conceptually, to give meaning to "nearness." I'd guess the only data they kept track of as the image streamed off the camera were two integer counts.
I wish I had more examples of algorithms where it is best explained as acting on a data structure that is not actually present, but all I can think of right now is using generators to represent a list.
I wish I had more examples of algorithms where it is best explained as acting on a data structure that is not actually present, but all I can think of right now is using generators to represent a list.