Humans did not accumulate that intuition just using images. In the example you gave, you subconsciously augment the image information with a lifetime of interacting with the world using all the other senses.
Yes, without extra information, manipulating everyday objects is probably as intuitive to robots as manipulating quantum scale molecules is for humans.