MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

Training large text-to-image models requires high-quality, curated datasets with diverse content and detailed captions. Yet the cost and complexity of collecting, filtering, deduplicating, and re-captioning such corpora at scale hinders open and reproducible research in the field. We introduce MONET...

Read Original Article →

Source

http://arxiv.org/abs/2605.21272v1