Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13050
Cited By
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
22 May 2023
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation"
20 / 20 papers shown
Title
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
56
0
0
25 Apr 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
46
1
0
13 Mar 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
65
0
0
09 Dec 2024
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
13
2
0
05 Sep 2024
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
22
8
0
22 Jul 2024
Network Bending of Diffusion Models for Audio-Visual Generation
Luke Dzwonczyk
Carmine-Emanuele Cella
David Ban
VGen
14
0
0
28 Jun 2024
AudioScenic: Audio-Driven Video Scene Editing
Kaixin Shen
Ruijie Quan
Linchao Zhu
Jun Xiao
Yi Yang
VGen
DiffM
16
1
0
25 Apr 2024
Diffusion-based Data Augmentation for Object Counting Problems
Zhen Wang
Yuelei Li
Jia Wan
Nuno Vasconcelos
14
4
0
25 Jan 2024
Unsupervised Multi-modal Feature Alignment for Time Series Representation Learning
Cheng Liang
Donghua Yang
Zhiyu Liang
Hongzhi Wang
Zheng Liang
Xiyang Zhang
Jianfeng Huang
AI4TS
43
1
0
09 Dec 2023
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji-Ali
Guha Balakrishnan
Vicente Ordonez
27
23
0
30 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
41
20
0
28 Nov 2023
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
8
6
0
07 Nov 2023
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
37
56
0
30 Sep 2023
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv
Itai Gat
Sagie Benaim
Lior Wolf
Idan Schwartz
Yossi Adi
DiffM
VGen
23
36
0
28 Sep 2023
Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation
Zhuqiang Lu
Kun Hu
Chaoyue Wang
Lei Bai
Zhiyong Wang
20
8
0
07 Sep 2023
MDSC: Towards Evaluating the Style Consistency Between Music and Dance
Zixiang Zhou
Weiyuan Li
Baoyuan Wang
10
1
0
04 Sep 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
515
0
02 Jan 2023
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mañdziuk
DiffM
46
12
0
27 Sep 2021
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
172
336
0
01 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
174
196
0
08 Jan 2021
1