It applies Masked Autoencoding (MAE), learning to predict the characteristics of masked regions from the surrounding context. It partitions the Earth's surface into hierarchical cells, and rasterizes features within each cell (e.g., coffee shops, parks) into multi-layer image representations. S2Vec outperforms prior models such as SATCLIP and GEOCLIP on socioeconomic prediction tasks (e.g., population density and median income), and demonstrates particularly strong zero-shot geographic adaptation to regions it was not trained on.
Mapping the modern world: How S2Vec learns the language of our cities
We introduce S2Vec, a self-supervised framework that transforms complex geospatial data into general-purpose embeddings to predict socioeconomic and environmental patterns across the globe.
https://research.google/blog/mapping-the-modern-world-how-s2vec-learns-the-language-of-our-cities/


Seonglae Cho