ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09748
  4. Cited By
Scalable Diffusion Models with Transformers
v1v2 (latest)

Scalable Diffusion Models with Transformers

IEEE International Conference on Computer Vision (ICCV), 2022
19 December 2022
William S. Peebles
Saining Xie
    GNN
ArXiv (abs)PDFHTMLHuggingFace (18 upvotes)

Papers citing "Scalable Diffusion Models with Transformers"

50 / 2,712 papers shown
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
277
27
0
08 Oct 2025
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Masih Aminbeidokhti
H. R. Medeiros
Eric Granger
M. Pedersoli
UQCV
245
0
0
08 Oct 2025
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
Nithin C. Babu
Aniruddha Mahapatra
Harsh Rangwani
Rajiv Soundararajan
Kuldeep Kulkarni
EGVMVGen
200
0
0
08 Oct 2025
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
Siyoon Jin
S. Kim
Dahyun Chung
J. Lee
Hyunwook Choi
Jisu Nam
J. Kim
S. Kim
VGen
106
2
0
08 Oct 2025
scPPDM: A Diffusion Model for Single-Cell Drug-Response Prediction
scPPDM: A Diffusion Model for Single-Cell Drug-Response Prediction
Zhaokang Liang
Shuyang Zhuang
Xiaoran Jiao
Weian Mao
Hao Chen
Chunhua Shen
74
0
0
08 Oct 2025
Heptapod: Language Modeling on Visual Signals
Heptapod: Language Modeling on Visual Signals
Yongxin Zhu
J. Chen
Yuanzhe Chen
Zhuo Chen
Dongya Jia
Jian Cong
Xiaobin Zhuang
Yuping Wang
Yuping Wang
VLM
162
0
0
08 Oct 2025
DreamOmni2: Multimodal Instruction-based Editing and Generation
DreamOmni2: Multimodal Instruction-based Editing and Generation
Bin Xia
Bohao Peng
Yuechen Zhang
Junjia Huang
Jiyang Liu
...
Chengyao Wang
Yitong Wang
Xinglong Wu
Bei Yu
Jiaya Jia
118
9
0
08 Oct 2025
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
Riccardo Mereu
Aidan Scannell
Yuxin Hou
Yi Zhao
Aditya Jitta
Antonio Dominguez
Luigi Acerbi
Amos Storkey
Paul E. Chang
VGenVLM
143
1
0
08 Oct 2025
WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation
WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation
Zezhong Qian
Xiaowei Chi
Yuming Li
Shizun Wang
Zhiyuan Qin
Xiaozhu Ju
Sirui Han
Shanghang Zhang
VGen
135
3
0
08 Oct 2025
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations
GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations
Fabian Paischer
Gianluca Galletti
William Hornsby
Paul Setinek
L. Zanisi
Naomi Carey
Stanislas Pamela
Johannes Brandstetter
213
3
0
08 Oct 2025
$\bf{D^3}$QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
D3\bf{D^3}D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Yanran Zhang
Bingyao Yu
Yu Zheng
Wenzhao Zheng
Yueqi Duan
Lei Chen
Jie Zhou
Jiwen Lu
MQ
193
1
0
07 Oct 2025
Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling
Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling
Young D. Kwon
Abhinav Mehrotra
Malcolm Chadwick
Alberto Gil C. P. Ramos
S. Bhattacharya
DiffM
168
0
0
07 Oct 2025
Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model
Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model
Danush Kumar Venkatesh
Adam Schmidt
Muhammad Abdullah Jamal
Omid Mohareri
VGenMedIm
144
0
0
07 Oct 2025
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation
Haoran Zhang
Shuanghao Bai
Wanqi Zhou
Yuedi Zhang
Qi Zhang
Pengxiang Ding
Cheng Chi
Donglin Wang
Badong Chen
LRM
200
1
0
07 Oct 2025
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Zanyi Wang
Dengyang Jiang
Liuzhuozheng Li
Sizhe Dang
Chengzu Li
H. Yang
Guang Dai
Mengmeng Wang
Jingdong Wang
VOSVGen
229
0
0
07 Oct 2025
SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation
SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation
Oindrila Saha
Vojtech Krs
R. Měch
Subhransu Maji
Kevin Blackburn-Matzen
Matheus Gadelha
127
1
0
07 Oct 2025
Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
Parallel Tokenizers: Rethinking Vocabulary Design for Cross-Lingual Transfer
Muhammad Dehan Al Kautsar
Fajri Koto
199
1
0
07 Oct 2025
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Jiahao Wang
Zhenpei Yang
Yijing Bai
Yingwei Li
Yuliang Zou
...
Zehao Zhu
Jyh-Jing Hwang
Dragomir Anguelov
Mingxing Tan
C. Jiang
VGen
101
0
0
07 Oct 2025
Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning
Riddled basin geometry sets fundamental limits to predictability and reproducibility in deep learning
Andrew Ly
Pulin Gong
AI4CE
187
0
0
07 Oct 2025
Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer
Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer
Maxence Lasbordes
Sinoué Gad
134
0
0
07 Oct 2025
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
Yang Xiao
Gen Li
Kaiyuan Deng
Yushu Wu
Zheng Zhan
Yanzhi Wang
Xiaolong Ma
Bo Hui
VGen
148
1
0
06 Oct 2025
StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation
Mingyu Liu
Jiuhe Shu
Hui Chen
Zeju Li
Canyu Zhao
J. Yang
Shenyuan Gao
Hao Chen
Chunhua Shen
117
1
0
06 Oct 2025
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization
Qiyuan He
Y. Li
Haotian Ye
Jinghao Wang
Xinyao Liao
Pheng-Ann Heng
Stefano Ermon
James Zou
Angela Yao
DiffMVGen
232
2
0
06 Oct 2025
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Zijing Hu
Yunze Tong
Fengda Zhang
Junkun Yuan
Jun Xiao
Kun Kuang
DiffM
193
1
0
06 Oct 2025
TBStar-Edit: From Image Editing Pattern Shifting to Consistency Enhancement
TBStar-Edit: From Image Editing Pattern Shifting to Consistency Enhancement
Hao Fang
Zechao Zhan
Weixin Feng
Ziwei Huang
Xubin Li
Tiezheng Ge
DiffM
341
0
0
06 Oct 2025
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization
Théophane Vallaeys
Jakob Verbeek
Matthieu Cord
DiffM
236
3
0
06 Oct 2025
Pulp Motion: Framing-aware multimodal camera and human motion generation
Pulp Motion: Framing-aware multimodal camera and human motion generation
Robin Courant
Xi Wang
David Loiseaux
Marc Christie
Vicky Kalogeiton
VGen
196
1
0
06 Oct 2025
Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion
Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion
Xin Li
Kaixiang Yang
Qiang Li
Zhiwei Wang
DiffMMedIm
191
0
0
06 Oct 2025
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Le Zhuo
Songhao Han
Yuandong Pu
Boxiang Qiu
Sayak Paul
...
Yihao Liu
Jie Shao
Xi Chen
Si Liu
Hongsheng Li
EGVM
245
3
0
06 Oct 2025
Scaling Sequence-to-Sequence Generative Neural Rendering
Scaling Sequence-to-Sequence Generative Neural Rendering
Shikun Liu
Kam Woh Ng
Wonbong Jang
Jiadong Guo
Junlin Han
...
Juan C. Pérez
Zijian Zhou
Chi Phung
Tao Xiang
Juan-Manuel Perez-Rua
VGen
129
1
0
05 Oct 2025
MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering
MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering
Lixuan He
Shikang Zheng
Linfeng Zhang
159
0
0
05 Oct 2025
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
Shikang Zheng
Guantao Chen
Qinming Zhou
Yuqi Lin
Lixuan He
Chang Zou
Peiliang Cai
Jiacheng Liu
Linfeng Zhang
153
2
0
05 Oct 2025
Principled and Tractable RL for Reasoning with Diffusion Language Models
Principled and Tractable RL for Reasoning with Diffusion Language Models
Anthony Zhan
DiffMAI4CE
114
2
0
05 Oct 2025
ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context
ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context
Huiwon Jang
Sihyun Yu
Heeseung Kwon
Hojin Jeon
Younggyo Seo
Jinwoo Shin
137
1
0
05 Oct 2025
FoilDiff: A Hybrid Transformer Backbone for Diffusion-based Modelling of 2D Airfoil Flow Fields
FoilDiff: A Hybrid Transformer Backbone for Diffusion-based Modelling of 2D Airfoil Flow Fields
Kenechukwu Ogbuagu
S. Maleki
G. Bruni
S. Krishnababu
DiffMAI4CE
539
0
0
05 Oct 2025
Drax: Speech Recognition with Discrete Flow Matching
Drax: Speech Recognition with Discrete Flow Matching
Aviv Navon
Aviv Shamsian
Neta Glazer
Yael Segal-Feldman
Gill Hetz
Joseph Keshet
Ethan Fetaya
130
1
0
05 Oct 2025
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
Xuehai He
Shijie Zhou
Thivyanth Venkateswaran
Kaizhi Zheng
Ziyu Wan
A. Kadambi
Xin Eric Wang
VGenSyDaAI4CE
163
0
0
05 Oct 2025
Proximal Diffusion Neural Sampler
Proximal Diffusion Neural Sampler
Wei Guo
Jaemoo Choi
Y. Zhu
Molei Tao
Yongxin Chen
DiffM
167
5
0
04 Oct 2025
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Bumjun Kim
Dongjae Jeon
Dueun Kim
Wonje Jeung
Albert No
145
0
0
04 Oct 2025
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Sina Alemohammad
Zinan Lin
Richard G. Baraniuk
SyDa
308
1
0
04 Oct 2025
Generating Human Motion Videos using a Cascaded Text-to-Video Framework
Generating Human Motion Videos using a Cascaded Text-to-Video Framework
Hyelin Nam
Hyojun Go
Byeongjun Park
Byung-Hoon Kim
Hyungjin Chung
DiffMVGen
127
3
0
04 Oct 2025
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
Junbao Zhou
Yuan Zhou
Kesen Zhao
Qingshan Xu
B. Zhu
Richang Hong
Hanwang Zhang
DiffMVGen
235
4
0
03 Oct 2025
What Drives Compositional Generalization in Visual Generative Models?
What Drives Compositional Generalization in Visual Generative Models?
Karim Farid
Rajat Sahay
Yumna Ali Alnaggar
Simon Schrodi
Volker Fischer
Cordelia Schmid
Thomas Brox
CoGe
325
0
0
03 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
267
4
0
03 Oct 2025
Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft
Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft
Junchao Huang
Xinting Hu
Boyao Han
Shaoshuai Shi
Zhuotao Tian
Tianyu He
Li Jiang
193
7
0
03 Oct 2025
SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
Amir Dellali
Luca A. Lanzendörfer
Florian Grötschla
Roger Wattenhofer
VGen
121
0
0
03 Oct 2025
FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks
FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks
Jaemin Kim
Hongjun Um
Sungkyun Kim
Yongjun Park
Jiwon Seo
MQ
146
0
0
03 Oct 2025
When and Where do Events Switch in Multi-Event Video Generation?
When and Where do Events Switch in Multi-Event Video Generation?
Ruotong Liao
Guowen Huang
Qing Cheng
Thomas Seidl
Daniel Cremers
Volker Tresp
DiffMVGen
213
0
0
03 Oct 2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
Best-of-Majority: Minimax-Optimal Strategy for Pass@kkk Inference Scaling
Qiwei Di
Kaixuan Ji
Xuheng Li
Heyang Zhao
Quanquan Gu
113
1
0
03 Oct 2025
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
Cai Zhou
Chenxiao Yang
Yi Hu
Chenyu Wang
Chubin Zhang
Muhan Zhang
Lester Mackey
Tommi Jaakkola
Stephen Bates
Dinghuai Zhang
169
5
0
03 Oct 2025
Previous
123...8910...535455
Next
Page 9 of 55
Pageof 55