Can't wait to dive into this and see how the masked diffusion approach affects modality-agnostic learning - huge potential implications for multimodal tasks. https://www.reddit.com/user/marcusaureliusN