Visual Foundational Model Embedding = spatially aligned local image embeddingsRGB image data is very sensitive to lighting/angle/noise and does not directly contain "semantic meaning"