Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2506.07986
Cited By
v1
v2
v3 (latest)
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
9 June 2025
Zhengyao Lv
Tianlin Pan
Chenyang Si
Zhaoxi Chen
W. Zuo
Yu Qiao
Kwan-Yee K. Wong
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (19 upvotes)
Papers citing
"Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers"
4 / 4 papers shown
Group Relative Attention Guidance for Image Editing
Xuanpu Zhang
Xuesong Niu
Ruidong Chen
Dan Song
Jianhao Zeng
Penghui Du
Haoxiang Cao
Kai Wu
An-an Liu
DiffM
210
0
0
28 Oct 2025
Towards Relaxed Multimodal Inputs for Gait-based Parkinson's Disease Assessment
Minlin Zeng
Z. Zhou
Yang Qiu
Martin J. McKeown
Zhiqi Shen
168
0
0
17 Oct 2025
JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation
Siheng Wan
Zhengtao Yao
Zhengdao Li
Junhao Dong
Yanshu Li
...
Haoyan Xu
Yijiang Li
Zhikang Dong
Huacan Wang
Jifeng Shen
DiffM
111
0
0
01 Oct 2025
UniVid: The Open-Source Unified Video Model
Jiabin Luo
Junhui Lin
Zeyu Zhang
Biao Wu
Meng Fang
Ling-Hao Chen
Hao Tang
VGen
283
8
0
29 Sep 2025
1