Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.15564
Cited By
v1
v2
v3 (latest)
Show-o2: Improved Native Unified Multimodal Models
18 June 2025
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (29 upvotes)
Github (1709★)
Papers citing
"Show-o2: Improved Native Unified Multimodal Models"
43 / 43 papers shown
Title
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
36
0
0
11 Nov 2025
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
Jiakui Hu
Shanshan Zhao
Qing-Guo Chen
Xuerui Qiu
Jialun Liu
Zhao Xu
Weihua Luo
Kaifu Zhang
Yanye Lu
VGen
41
0
0
10 Nov 2025
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Yongyuan Liang
Wei Chow
Feng Li
Ziqiao Ma
Xiyao Wang
Jiageng Mao
Jiuhai Chen
Jiatao Gu
Y. Wang
Furong Huang
LRM
112
0
0
03 Nov 2025
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Zeyu Wang
Z. Chen
Chenhui Gou
Feng Li
Chaorui Deng
...
Kunchang Li
Weihao Yu
Haoqin Tu
Haoqi Fan
Cihang Xie
92
0
0
27 Oct 2025
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
Yibin Wang
Zhimin Li
Yuhang Zang
Jiazi Bu
Yujie Zhou
...
Junjun He
Chunyu Wang
Qinglin Lu
Cheng Jin
J. Wang
EGVM
VLM
137
1
0
21 Oct 2025
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
Yongshun Zhang
Zhongyi Fan
Yonghang Zhang
Zhangzikang Li
Weifeng Chen
Zhongwei Feng
Chaoyue Wang
Peng Hou
Anxiang Zeng
VGen
146
0
0
20 Oct 2025
Generative Universal Verifier as Multimodal Meta-Reasoner
Xinchen Zhang
X. Zhang
Youbin Wu
Yanbin Cao
Renrui Zhang
Ruihang Chu
Ling Yang
Yujiu Yang
LRM
37
0
0
15 Oct 2025
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
Weiyang Jin
Yuwei Niu
Jiaqi Liao
Chengqi Duan
Aoxue Li
Shenghua Gao
Xihui Liu
LRM
65
1
0
14 Oct 2025
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Hongxiang Li
Yaowei Li
Bin Lin
Yuwei Niu
Yuhang Yang
Xiaoshuang Huang
Jiayin Cai
Xiaolong Jiang
Yao Hu
Long Chen
EGVM
LRM
52
0
0
13 Oct 2025
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue
H. Zhang
Xiangyu Zeng
Boyu Chen
Chenting Wang
...
Lu Dong
Kunpeng Du
Yi Wang
Limin Wang
Yali Wang
52
0
0
12 Oct 2025
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cong Wei
Quande Liu
Zixuan Ye
Qiulin Wang
Xintao Wang
Pengfei Wan
Kun Gai
Wenhu Chen
VGen
72
0
0
09 Oct 2025
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
Zhentao Zou
Zhengrong Yue
Kunpeng Du
Binlei Bao
Hanting Li
...
Yue Zhou
Yali Wang
Jie Hu
Xue Jiang
X. Chen
LRM
56
0
0
09 Oct 2025
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Kang Liao
Size Wu
Zhonghua Wu
Linyi Jin
Chao Wang
Y. Wang
Fei Wang
Wei Li
Chen Change Loy
MLLM
VGen
84
0
0
09 Oct 2025
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer
Ziyuan Huang
Dandan Zheng
Cheng Zou
Rui Liu
Xiaolong Wang
...
Jiajia Liu
Qingpei Guo
Ming-Hsuan Yang
Jingdong Chen
Jun Zhou
56
4
0
08 Oct 2025
Growing Visual Generative Capacity for Pre-Trained MLLMs
Hanyu Wang
Jiaming Han
Ziyan Yang
Qi Zhao
Shanchuan Lin
Xiangyu Yue
Abhinav Shrivastava
Zhenheng Yang
Hao Chen
VLM
114
0
0
02 Oct 2025
STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models
Shaoxiong Guo
Tianyi Du
Lijun Li
Y. Wu
Jie Li
Jing Shao
AAML
64
0
0
30 Sep 2025
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
Yuxin Song
Wenkai Dong
Shizun Wang
Qi Zhang
Song Xue
...
H. Yang
Haocheng Feng
Hang Zhou
Xinyan Xiao
Jingdong Wang
DiffM
MLLM
65
0
0
30 Sep 2025
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
Yang Shi
Yuhao Dong
Yue Ding
Yuran Wang
Xuanyu Zhu
...
Wenjing Yang
Yuanxing Zhang
Pengfei Wan
Yi Zhang
Ziwei Liu
ELM
64
1
0
29 Sep 2025
Latent Visual Reasoning
Bangzheng Li
Ximeng Sun
Jiang-Long Liu
Ze Wang
Jialian Wu
Xiaodong Yu
Hao Chen
Emad Barsoum
Muhao Chen
Zicheng Liu
LRM
VLM
116
2
0
29 Sep 2025
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Jitai Hao
Hao Liu
Xinyan Xiao
Qiang Huang
Jun Yu
52
0
0
29 Sep 2025
Planning with Unified Multimodal Models
Yihao Sun
Zhilong Zhang
Yang Yu
Pierre-Luc Bacon
LRM
28
0
0
27 Sep 2025
GenExam: A Multidisciplinary Text-to-Image Exam
Zhaokai Wang
Penghao Yin
Xiangyu Zhao
Changyao Tian
Yu Qiao
Wenhai Wang
Jifeng Dai
Gen Luo
ELM
139
0
0
17 Sep 2025
AToken: A Unified Tokenizer for Vision
Jiasen Lu
Liangchen Song
Mingze Xu
Byeongjoo Ahn
Yanjun Wang
Chen Chen
Afshin Dehghan
Yinfei Yang
ViT
141
3
0
17 Sep 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Rongyao Fang
Aldrich Yu
Chengqi Duan
Linjiang Huang
S. Bai
Yuxuan Cai
Kun Wang
Si Liu
Xihui Liu
Xue Yang
EGVM
VGen
ReLM
LRM
150
4
0
11 Sep 2025
Reconstruction Alignment Improves Unified Multimodal Models
Ji Xie
Trevor Darrell
Luke Zettlemoyer
Xudong Wang
94
6
0
08 Sep 2025
Interleaving Reasoning for Better Text-to-Image Generation
Wenxuan Huang
Shuang Chen
Zheyong Xie
Shaosheng Cao
Shixiang Tang
...
Z. Yin
Juil Sock
Yu Cheng
Wanli Ouyang
Shaohui Lin
103
5
0
08 Sep 2025
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Ouxiang Li
Yuan Wang
Xinting Hu
Huijuan Huang
Rui Chen
Jiarong Ou
Xin Tao
Pengfei Wan
Xiaojuan Qi
Fuli Feng
EGVM
CoGe
LRM
165
2
0
03 Sep 2025
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li
Xinyu Peng
Y. Wang
Zelin Peng
Xin Chen
Rongxiang Weng
Jingang Wang
Xunliang Cai
Wenrui Dai
Hongkai Xiong
MLLM
OffRL
188
4
0
03 Sep 2025
EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control
Delin Qu
Haoming Song
Qizhi Chen
Zhaoqing Chen
Xianqiang Gao
...
Maoqing Yao
Haoran Yang
Jiacheng Bao
Jiangwei Zhong
Dong Wang
LM&Ro
196
5
0
28 Aug 2025
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Y. Wang
Zhimin Li
Yuhang Zang
Yujie Zhou
Jiazi Bu
Chunyu Wang
Qinglin Lu
Cheng Jin
Jiaqi Wang
EGVM
84
11
0
28 Aug 2025
MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
Qian Liang
Yujia Wu
Kuncheng Li
Jiwei Wei
Shiyuan He
Jinyu Guo
Ning Xie
LRM
68
1
0
15 Aug 2025
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
NextStep Team
Chunrui Han
Guopeng Li
J. Wu
Quan Sun
...
Ziyang Meng
Binxing Jiao
Daxin Jiang
X. Zhang
Yibo Zhu
DiffM
132
12
0
14 Aug 2025
TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning
Junzhe Xu
Yuyang Yin
Xi Chen
119
1
0
11 Aug 2025
Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions
Jingxuan Wei
Caijun Jia
Qi Chen
Honghao He
Linzhuang Sun
Conghui He
Lijun Wu
Bihui Yu
Cheng Tan
LRM
78
2
0
05 Aug 2025
Qwen-Image Technical Report
Chenfei Wu
Jiahao Nick Li
Jingren Zhou
Junyang Lin
Kaiyuan Gao
...
Yichang Zhang
Yongqiang Zhu
Y. Wu
Yuxuan Cai
Zenan Liu
DiffM
VLM
116
92
0
04 Aug 2025
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Hao Tang
Chenwei Xie
Xiaoyi Bao
Tingyu Weng
P. Li
Yun Zheng
Liwei Wang
83
4
0
31 Jul 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng
Y. Wang
Yeyao Ma
Chen Li
Yongming Rao
...
Han Hu
Xiaosong Zhang
Linus
Di Wang
Jie Jiang
78
14
0
29 Jul 2025
Pixels, Patterns, but No Poetry: To See The World like Humans
Hongcheng Gao
Longxiang Zhang
Lin Xu
Jingyi Tang
X. Li
...
Xinlong Yang
Ge Wu
Balong Bi
Hongyu Chen
Wentao Zhang
MLLM
LRM
VLM
82
2
0
21 Jul 2025
Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR
Peirong Zhang
Haowei Xu
Jiaxin Zhang
Guitao Xu
Xuhan Zheng
Zhenhua Yang
Junle Liu
Yuyi Zhang
Lianwen Jin
EGVM
202
2
0
20 Jul 2025
Omni-Video: Democratizing Unified Video Understanding and Generation
Zhiyu Tan
Hao Yang
Luozheng Qin
Jia Gong
Mengping Yang
Hao Li
VGen
VLM
177
3
0
08 Jul 2025
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang
Yixiao Fang
Peng Xing
Shuhan Wu
Wei Cheng
Rui Wang
Xianfang Zeng
Gang Yu
H. Chen
EGVM
VLM
285
13
0
09 Jun 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
Mingyu Ding
MLLM
VLM
322
27
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
779
17
0
05 May 2025
1