It's due to the fact that they used satellite imagery to create the embeddings. The map is just for visualization. They probably used 5 or more bands of the satellite data which means each pixel is going to be slightly different due to things like depth, amount of silt in the water, amount of plankton....
Having worked on these types of problems before the model is doing a pretty great job matching pixels.
Thanks! And you are giving it too much credit here - it's just trained on one-hot encoded land cover (24 classes) from Copernicus. Using imagery directly would be # 2 on my list of to-dos after including elevation in the input data.
I intentionally avoided using lots of ocean areas - this way I cut down the number of required sites for inference from ~100 million (at resolution 7 in the H3 system) to around 25 million.
Having worked on these types of problems before the model is doing a pretty great job matching pixels.