Well, in general, you'd want to draw a casual link between real physical measurement and network traffic; so yeah, if you own the client (and can accurately determine whether or not it's running in a VM, and/or manipulated by a robot, which is tricky) you can filter out the data flak. If I worked for a data-collection org I'd probably ignore (or blacklist, if I could get away with it) a known source of noise.