Papers with Code - WebLI Dataset
WebLI (Web Language Image) is a web-scale multilingual image-text dataset, designed to support Google’s vision-language research, such as the large-scale pre-training for image understanding, image captioning, visual question answering, object detection etc. The dataset is built from the public web, including image bytes, image-associated texts (alt-text, OCR, page title), 109 languages and many other features. The dataset is deduplicated on 68 common vision/vision-language tasks, and has no user or personally identifiable data with careful RAI considerations.
https://paperswithcode.com/dataset/webli