ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.18871
  4. Cited By
OmniGen2: Exploration to Advanced Multimodal Generation
v1v2v3 (latest)

OmniGen2: Exploration to Advanced Multimodal Generation

23 June 2025
Chenyuan Wu
PengFei Zheng
Ruiran Yan
Shitao Xiao
Xin Luo
Yueze Wang
W. Li
Xiyan Jiang
Y. Liu
Junjie Zhou
Ze Liu
Ziyi Xia
Chaofan Li
Haoge Deng
Jiahao Wang
Kun Luo
Bo Zhang
Defu Lian
X. Wang
Zhongyuan Wang
Tiejun Huang
Zheng Liu
    MLLMSyDaVLM
ArXiv (abs)PDFHTMLHuggingFace (71 upvotes)Github (3874★)

Papers citing "OmniGen2: Exploration to Advanced Multimodal Generation"

50 / 104 papers shown
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
Yuzhuo Chen
Zehua Ma
Jianhua Wang
Kai Kang
Shunyu Yao
Weiming Zhang
VLM
194
2
0
24 Dec 2025
I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
Juntong Wang
Jiarui Wang
Huiyu Duan
Jiaxiang Kang
Guangtao Zhai
Xiongkuo Min
VLM
178
0
0
04 Dec 2025
WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens
WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens
Jian Yang
Dacheng Yin
Xiaoxuan He
Y. Li
Fengyun Rao
Jing Lyu
Wei-dong Zhai
Yang Cao
Zheng-Jun Zha
VLM
243
0
0
02 Dec 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Zhiheng Liu
Weiming Ren
Haozhe Liu
Zijian Zhou
S. Chen
...
Ping Luo
Wei Liu
Tao Xiang
Jonas Schult
Yuren Cong
166
2
0
01 Dec 2025
DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models
Patrick Kwon
Chen Chen
DiffMAI4TSVGen
160
0
0
01 Dec 2025
AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation
Yexin Liu
Wen-Jie Shu
Zile Huang
Haoze Zheng
Yueze Wang
Manyuan Zhang
Ser-Nam Lim
Harry Yang
DiffMVGen
90
0
0
01 Dec 2025
Reversible Inversion for Training-Free Exemplar-guided Image Editing
Yuke Li
Lianli Gao
Ji Zhang
Pengpeng Zeng
Lichuan Xiang
Hongkai Wen
Heng Tao Shen
Jingkuan Song
DiffM
138
0
0
01 Dec 2025
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Junyan Ye
Leiqi Zhu
Yuncheng Guo
Dongzhi Jiang
Zilong Huang
Yifan Zhang
Zhiyuan Yan
Haohuan Fu
Conghui He
Weijia Li
EGVM
137
0
0
29 Nov 2025
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Sinan Du
Jiahao Guo
Bo Li
Shuhao Cui
Zhengzhuo Xu
...
Yongxian Wei
Kun Gai
X. Wang
Kai Wu
C. Yuan
227
1
0
28 Nov 2025
Ovis-Image Technical Report
Ovis-Image Technical Report
Guo-Hua Wang
Liangfu Cao
Tianyu Cui
Minghao Fu
Xiaohao Chen
...
Jianshan Zhao
Lan Li
Bowen Fu
Jiaqi Liu
Qing-Guo Chen
VLM
561
0
0
28 Nov 2025
MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
Yuta Oshima
Daiki Miyake
Kohsei Matsutani
Yusuke Iwasawa
Masahiro Suzuki
Yutaka Matsuo
Hiroki Furuta
67
0
0
28 Nov 2025
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models
ReasonEdit: Towards Reasoning-Enhanced Image Editing Models
Fukun Yin
Shiyu Liu
Yucheng Han
Zhibo Wang
Peng Xing
...
Pengtao Chen
Xiangyu Zhang
Daxin Jiang
Xianfang Zeng
Gang Yu
DiffMKELMLRM
252
0
0
27 Nov 2025
Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation
Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation
Shubhankar Borse
Phuc Pham
Farzad Farhadzadeh
Seokeon Choi
P. Nguyen
Anh Tran
Sungrack Yun
Munawar Hayat
Fatih Porikli
97
0
0
27 Nov 2025
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Ziyun Zeng
Hang Hua
Jiebo Luo
KELMLM&RoLRM
364
0
0
26 Nov 2025
CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion
CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion
Dianbing Xi
Jiepeng Wang
Yuanzhi Liang
Xi Qiu
Jialun Liu
...
Yuchi Huo
Rui Wang
H. Huang
Chi Zhang
Xuelong Li
DiffMVGen
213
0
0
26 Nov 2025
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
Zhoujie Fu
Xianfang Zeng
Jinghong Lan
Xinyao Liao
Cheng Chen
...
Wei Cheng
Shiyu Liu
Y. Chen
Gang Yu
Guosheng Lin
DiffMVGen
361
1
0
25 Nov 2025
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
Yuwei Niu
Weiyang Jin
Jiaqi Liao
Chaoran Feng
Peng Jin
Bin Lin
Zongjian Li
Bin Zhu
Weihao Yu
Li Yuan
SyDaLRM
471
1
0
25 Nov 2025
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
Xiang Wang
Zhifei Zhang
Chentao Song
Zhe Lin
Yuqian Zhou
...
Haitian Zheng
Jason Kuen
Yuehuan Wang
Changxin Gao
Nong Sang
MoE
175
1
0
25 Nov 2025
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
Ziheng Ouyang
Yiren Song
Y. Liu
Shihao Zhu
Qibin Hou
Ming-Ming Cheng
Mike Zheng Shou
150
0
0
25 Nov 2025
Are Image-to-Video Models Good Zero-Shot Image Editors?
Are Image-to-Video Models Good Zero-Shot Image Editors?
Zechuan Zhang
Zhenyuan Chen
Zongxin Yang
Yi Yang
DiffMVGen
567
0
0
24 Nov 2025
MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference
MagicWand: A Universal Agent for Generation and Evaluation Aligned with User Preference
Zitong Xu
Dake Shen
Yaosong Du
Kexiang Hao
Jinghan Huang
Xiande Huang
82
0
0
23 Nov 2025
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation
Tao Shen
Xin Wan
Taicai Chen
Rui Zhang
Junwen Pan
...
Y. Yang
Chen Cheng
Qi She
Chang Liu
Zhenbang Sun
DiffM
108
1
0
23 Nov 2025
SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration
SPIDER: Spatial Image CorresponDence Estimator for Robust Calibration
Zhimin Shao
Abhay Kumar Yadav
Rama Chellappa
Cheng-Fang Peng
98
0
0
21 Nov 2025
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards
Omkar Thawakar
Shravan Venkatraman
Ritesh Thawkar
Abdelrahman M. Shaker
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Fahad A Khan
SyDaLRMVLM
336
4
0
20 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
625
1
0
16 Nov 2025
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Haozhe Liu
Ding Liu
Mingchen Zhuge
Zijian Zhou
Tian Xie
...
Juan-Manuel Perez-Rua
Tao Xiang
Wei Liu
Shikun Liu
Jürgen Schmidhuber
105
0
0
15 Nov 2025
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Yiyang Zhou
Haoqin Tu
Z. Wang
Zeyu Wang
Niklas Muennighoff
...
Shen Yan
Haoqi Fan
Cihang Xie
Huaxiu Yao
Qinghao Ye
LRM
257
3
0
04 Nov 2025
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
Feng Han
Y. Wang
Chenglin Li
Zheming Liang
Dianyi Wang
...
Zhipeng Wei
Chao Gong
Cheng Jin
Yue Yu
J. Wang
197
2
0
03 Nov 2025
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Yongyuan Liang
Wei Chow
Feng Li
Ziqiao Ma
Xiyao Wang
Jiageng Mao
Jiuhai Chen
Jiatao Gu
Y. Wang
Furong Huang
LRM
248
3
0
03 Nov 2025
Emu3.5: Native Multimodal Models are World Learners
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui
Honghao Chen
Haoge Deng
X. Y. Huang
Xinghang Li
...
Zhuo Chen
Yulong Ao
Tiejun Huang
Zhongyuan Wang
Xinlong Wang
MLLMVGen
471
21
0
30 Oct 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
228
5
0
28 Oct 2025
Uniform Discrete Diffusion with Metric Path for Video Generation
Uniform Discrete Diffusion with Metric Path for Video Generation
Haoge Deng
Ting Pan
Fan Zhang
Y. Liu
Zhuoyan Luo
...
Wenxuan Wang
Chunhua Shen
Shiguang Shan
Zhaoxiang Zhang
Xinlong Wang
VGen
170
2
0
28 Oct 2025
Revisiting Multimodal Positional Encoding in Vision-Language Models
Revisiting Multimodal Positional Encoding in Vision-Language Models
Jie Huang
Xuejing Liu
Sibo Song
Ruibing Hou
Hong Chang
Junyang Lin
S. Bai
162
2
0
27 Oct 2025
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Zeyu Wang
Z. Chen
Chenhui Gou
Feng Li
Chaorui Deng
...
Kunchang Li
Weihao Yu
Haoqin Tu
Haoqi Fan
Cihang Xie
367
0
0
27 Oct 2025
FARMER: Flow AutoRegressive Transformer over Pixels
FARMER: Flow AutoRegressive Transformer over Pixels
Guangting Zheng
Qinyu Zhao
Tao Yang
Fei Xiao
Zhijie Lin
Jie Wu
Jiajun Deng
Y. Zhang
Rui Zhu
VGen
261
4
0
27 Oct 2025
UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization
UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization
Huixuan Zhang
Xiaojun Wan
EGVM
174
0
0
27 Oct 2025
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
LayerComposer: Multi-Human Personalized Generation via Layered Canvas
Guocheng Qian
Ruihang Zhang
Tsai-Shien Chen
Yusuf Dalva
Anujraaj Goyal
...
Arpit Sahni
Daniil Ostashev
Ju Hu
Sergey Tulyakov
Kuan-Chieh Wang
DiffM
221
1
0
23 Oct 2025
EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization
EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization
Yixiong Yang
Tao Wu
Senmao Li
Shiqi Yang
Yaxing Wang
Joost van de Weijer
Kai Wang
DiffM
168
0
0
23 Oct 2025
GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models
GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models
Muhammad Atif Butt
Alexandra Gomez-Villa
Tao Wu
Javier Vázquez-Corral
Joost van de Weijer
Kai Wang
EGVMVLM
185
0
0
23 Oct 2025
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
Yibin Wang
Zhimin Li
Yuhang Zang
Jiazi Bu
Yujie Zhou
...
Junjun He
Chunyu Wang
Qinglin Lu
Cheng Jin
J. Wang
EGVMVLM
251
4
0
21 Oct 2025
PICABench: How Far Are We from Physically Realistic Image Editing?
PICABench: How Far Are We from Physically Realistic Image Editing?
Yuandong Pu
Le Zhuo
Songhao Han
Jinbo Xing
Kaiwen Zhu
...
Hongsheng Li
Yu Qiao
W. Zhang
Xi Chen
Yihao Liu
275
1
0
20 Oct 2025
Chimera: Compositional Image Generation using Part-based Concepting
Chimera: Compositional Image Generation using Part-based Concepting
Shivam Singh
Yiming Chen
Agneet Chatterjee
Amit Raj
James Hays
Yezhou Yang
Chitta Baral
DiffM
299
0
0
20 Oct 2025
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Zongjian Li
Zheyuan Liu
Qihui Zhang
Bin Lin
Feize Wu
...
Wangbo Yu
Yuwei Niu
Shaodong Wang
Xinhua Cheng
Li Yuan
407
14
0
19 Oct 2025
When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs
When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs
Hongcheng Liu
Pingjie Wang
Yuhao Wang
Siqu Ou
Yanfeng Wang
Y Samuel Wang
LRM
157
0
0
17 Oct 2025
BLIP3o-NEXT: Next Frontier of Native Image Generation
BLIP3o-NEXT: Next Frontier of Native Image Generation
Jiuhai Chen
Le Xue
Zhiyang Xu
Xichen Pan
Shusheng Yang
...
Tianyi Zhou
Junnan Li
Silvio Savarese
Caiming Xiong
Ran Xu
121
16
0
17 Oct 2025
WithAnyone: Towards Controllable and ID Consistent Image Generation
WithAnyone: Towards Controllable and ID Consistent Image Generation
H. Xu
Wei Cheng
Peng Xing
Yixiao Fang
Shuhan Wu
...
Xianfang Zeng
Daxin Jiang
Gang Yu
Xingjun Ma
Yu-Gang Jiang
DiffM
240
5
0
16 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
159
1
0
16 Oct 2025
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
Kai Zou
Longxiang Zhang
Yuhao Dong
Shulin Tian
Dian Zheng
Hongbo Liu
Jingwen He
Bin Liu
Yu Qiao
Ziwei Liu
129
6
0
15 Oct 2025
Generative Universal Verifier as Multimodal Meta-Reasoner
Generative Universal Verifier as Multimodal Meta-Reasoner
Xinchen Zhang
X. Zhang
Youbin Wu
Yanbin Cao
Renrui Zhang
Ruihang Chu
Ling Yang
Yujiu Yang
LRM
188
4
0
15 Oct 2025
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Ruihang Xu
Dewei Zhou
Fan Ma
Yi Yang
DiffM
201
2
0
13 Oct 2025
123
Next
Page 1 of 3