Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.04628
Cited By
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning
8 September 2023
Saurabhchand Bhati
Jesús Villalba
Laureano Moro Velázquez
Thomas Thebaud
Najim Dehak
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning"
3 / 3 papers shown
Title
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
CLIP
38
32
0
03 Oct 2022
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro Velázquez
Najim Dehak
SSL
53
22
0
05 Oct 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
1