Lumina-DiMOO: An open-source discrete multimodal diffusion model

https://news.ycombinator.com/rss Hits: 3
Summary

Abstract We introduce Lumina-DiMOO, an open-source foundational model for seamless multimodal generation and understanding. Lumina-DiMOO sets itself apart from prior unified models by utilizing a fully discrete diffusion modeling to handle inputs and outputs across various modalities. This innovative approach allows Lumina-DiMOO to achieve higher sampling efficiency compared to previous autoregressive (AR) or hybrid AR-diffusion paradigms and adeptly support a broad spectrum of multimodal tasks, including text-to-image generation, image-to-image generation (e.g., image editing, subject-driven generation, and image inpainting, etc.), as well as image understanding. Lumina-DiMOO achieves state-of-the-art performance on multiple benchmarks, surpassing existing open-source unified multimodal models. To foster further advancements in multimodal and dicrete diffusion model research, we release our code and checkpoints. Citation @article{Lumina-DiMOO, title={Lumina-DiMOO:An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding}, author={Yi Xin, Qi Qin, Siqi Luo, Kaiwen Zhu, Juncheng Yan, Yan Tai, Jiayi Lei, Yuewen Cao, Yuandong Pu, Le Zhuo, Shenglong Ye, Ming Hu, Junjun He, Bo Zhang, Dengyang Jiang, Gen Luo, Chang Xu, Wenhai Wang, Hongsheng Li, Guangtao Zhai, Tianfan Xue, Xiaohong Liu, Bin Fu, Yu Qiao, and Yihao Liu}, year={2025} }

First seen: 2025-09-12 14:48

Last seen: 2025-09-12 16:55