ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09841
  4. Cited By
Taming Transformers for High-Resolution Image Synthesis

Taming Transformers for High-Resolution Image Synthesis

17 December 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
    ViT
ArXivPDFHTML

Papers citing "Taming Transformers for High-Resolution Image Synthesis"

50 / 476 papers shown
Title
Focused ReAct: Improving ReAct through Reiterate and Early Stop
Focused ReAct: Improving ReAct through Reiterate and Early Stop
Shuoqiu Li
Han Xu
Haipeng Chen
ReLM
LRM
28
4
0
14 Oct 2024
SceneCraft: Layout-Guided 3D Scene Generation
SceneCraft: Layout-Guided 3D Scene Generation
Xiuyu Yang
Yunze Man
Jun-Kun Chen
Yu-Xiong Wang
3DV
82
8
0
11 Oct 2024
Distillation of Discrete Diffusion through Dimensional Correlations
Distillation of Discrete Diffusion through Dimensional Correlations
Satoshi Hayakawa
Yuhta Takida
Masaaki Imaizumi
Hiromi Wakaki
Yuki Mitsufuji
DiffM
56
0
0
11 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
31
12
0
10 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image
  Animation
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffM
VGen
36
22
0
10 Oct 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Jinbin Bai
Tian-Chun Ye
Wei Chow
Enxin Song
Qing-Guo Chen
Xiangtai Li
Zhen Dong
Lei Zhu
50
13
0
10 Oct 2024
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
Xiaoxiao He
Ligong Han
Quan Dao
Song Wen
Minhao Bai
...
Hongdong Li
Junzhou Huang
Faez Ahmed
Akash Srivastava
Dimitris Metaxas
DiffM
SyDa
38
4
0
10 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffM
VGen
33
0
0
10 Oct 2024
Think While You Generate: Discrete Diffusion with Planned Denoising
Think While You Generate: Discrete Diffusion with Planned Denoising
Sulin Liu
Juno Nam
Andrew Campbell
Hannes Stärk
Yilun Xu
Tommi Jaakkola
Rafael Gómez-Bombarelli
DiffM
33
6
0
08 Oct 2024
Restructuring Vector Quantization with the Rotation Trick
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
43
11
0
08 Oct 2024
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon
Jong Chul Ye
DiffM
56
3
0
08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
66
64
0
08 Oct 2024
Towards Unsupervised Blind Face Restoration using Diffusion Prior
Towards Unsupervised Blind Face Restoration using Diffusion Prior
Tianshu Kuai
Sina Honari
Igor Gilitschenski
Alex Levinshtein
DiffM
31
0
0
06 Oct 2024
IceCloudNet: 3D reconstruction of cloud ice from Meteosat SEVIRI
IceCloudNet: 3D reconstruction of cloud ice from Meteosat SEVIRI
K. Jeggle
Mikolaj Czerkawski
F. Serva
B. L. Saux
D. Neubauer
Ulrike Lohmann
25
1
0
05 Oct 2024
Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs
Zebra: In-Context and Generative Pretraining for Solving Parametric PDEs
Louis Serrano
Armand K. Koupai
Thomas X. Wang
Pierre Erbacher
Patrick Gallinari
AI4CE
26
3
0
04 Oct 2024
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Doohyuk Jang
Sihwan Park
J. Yang
Yeonsung Jung
Jihun Yun
Souvik Kundu
Sung-Yub Kim
Eunho Yang
45
7
0
04 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
132
14
0
03 Oct 2024
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang
Tianwei Xiong
Daquan Zhou
Zhijie Lin
Yang Zhao
Bingyi Kang
Jiashi Feng
Xihui Liu
VGen
46
23
0
03 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
20
0
0
03 Oct 2024
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Y. Zhou
Zijie Li
Michael Schneier
John R Buchanan Jr
Amir Barati Farimani
AI4CE
DiffM
52
5
0
02 Oct 2024
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Jie Cheng
Ruixi Qiao
Gang Xiong
Binhua Li
Yingwei Ma
Binhua Li
Yongbin Li
Yisheng Lv
OffRL
OnRL
LM&Ro
35
3
0
01 Oct 2024
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Jian Yang
Xukun Wang
Wentao Wang
Guoming Li
Qihang Fang
Ruihong Yuan
Tianyang Wang
Jason Zhaoxin Fan
Yeying Jin
Zhaoxin Fan
VGen
41
1
0
01 Oct 2024
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation
Yuran Wang
Zhijing Wan
Yansheng Qiu
Zheng Wang
DiffM
MedIm
29
0
0
30 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
51
11
0
26 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
63
4
0
26 Sep 2024
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Xinrui Zhou
Yuhao Huang
Haoran Dou
Shijing Chen
Ao Chang
...
Jie Jessie Ren
Ruobing Huang
Jun Cheng
Wufeng Xue
Dong Ni
MedIm
87
0
0
25 Sep 2024
Mixture of Efficient Diffusion Experts Through Automatic Interval and
  Sub-Network Selection
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh
Yan Kang
Yuchen Liu
Richard Y. Zhang
Zhe Lin
Heng Huang
DiffM
24
2
0
23 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
52
10
0
23 Sep 2024
Dormant: Defending against Pose-driven Human Image Animation
Dormant: Defending against Pose-driven Human Image Animation
Jiachen Zhou
Mingsi Wang
Tianlin Li
Guozhu Meng
Kai Chen
47
3
0
22 Sep 2024
Deep Learning based Optical Image Super-Resolution via Generative
  Diffusion Models for Layerwise in-situ LPBF Monitoring
Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring
Francis Ogoke
Sumesh Kalambettu Suresh
Jesse Adamczyk
D. Bolintineanu
Anthony Garland
Michael J. Heiden
A. Farimani
30
0
0
20 Sep 2024
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
Daeun Kyung
J. Kim
Tackeun Kim
E. Choi
MedIm
DiffM
41
1
0
11 Sep 2024
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via
  Cross-scale Querying Transformer
G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer
Jinzhi Zhang
Feng Xiong
Mu Xu
32
5
0
10 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
43
50
0
06 Sep 2024
OccLLaMA: An Occupancy-Language-Action Generative World Model for
  Autonomous Driving
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
Julong Wei
Shanshuai Yuan
Pengfei Li
Qingda Hu
Zhongxue Gan
Wenchao Ding
VLM
27
17
0
05 Sep 2024
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Haiyu Wu
Jaskirat Singh
Sicong Tian
Liang Zheng
Kevin W. Bowyer
CVBM
42
3
0
04 Sep 2024
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li
Yang Liu
Weiqing Liu
Shikai Fang
Lewen Wang
Chang Xu
Jiang Bian
VGen
38
4
0
04 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
25
7
0
31 Aug 2024
Scalable Autoregressive Image Generation with Mamba
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
53
12
0
22 Aug 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
M. Zhang
DiffM
47
1
0
16 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
72
389
0
12 Aug 2024
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion
  Methods
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods
Onkar Susladkar
Gayatri S Deshmukh
Sparsh Mittal
Parth Shastri
DiffM
34
3
0
07 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive
  Survey
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
38
16
0
06 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Yu Qiao
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
62
48
0
05 Aug 2024
Informed Correctors for Discrete Diffusion Models
Informed Correctors for Discrete Diffusion Models
Yixiu Zhao
Jiaxin Shi
Lester W. Mackey
Scott W. Linderman
Lester Mackey
Scott Linderman
37
9
0
30 Jul 2024
QueST: Self-Supervised Skill Abstractions for Learning Continuous
  Control
QueST: Self-Supervised Skill Abstractions for Learning Continuous Control
Atharva Mete
Haotian Xue
Albert Wilcox
Yongxin Chen
Animesh Garg
SSL
21
16
0
22 Jul 2024
HazeCLIP: Towards Language Guided Real-World Image Dehazing
HazeCLIP: Towards Language Guided Real-World Image Dehazing
Ruiyi Wang
Wenhao Li
Xiaohong Liu
Chunyi Li
Zicheng Zhang
Xiongkuo Min
Guangtao Zhai
CLIP
VLM
59
4
0
18 Jul 2024
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Zhexiao Xiong
Xin Xing
Scott Workman
Subash Khanal
Nathan Jacobs
DiffM
MDE
52
1
0
12 Jul 2024
On the Role of Discrete Tokenization in Visual Representation Learning
On the Role of Discrete Tokenization in Visual Representation Learning
Tianqi Du
Yifei Wang
Yisen Wang
35
7
0
12 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
34
3
0
10 Jul 2024
Previous
123456...8910
Next