Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.01180
Cited By
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
2 November 2022
Layne Berry
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Hung-yi Lee
David F. Harwath
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval"
9 / 9 papers shown
Title
Interface Design for Self-Supervised Speech Models
Yi-Jen Shih
David Harwath
54
1
0
18 Jun 2024
Translating speech with just images
Dan Oneaţă
Herman Kamper
VLM
23
1
0
11 Jun 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
39
0
0
20 Mar 2024
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang
Yi-Jen Shih
Heng-Jui Chang
Layne Berry
Puyuan Peng
Hung-yi Lee
Hsin-Min Wang
David F. Harwath
VLM
38
2
0
10 Feb 2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
29
1
0
08 Feb 2024
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
13
1
0
23 Jul 2023
Visually grounded few-shot word learning in low-resource settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
15
4
0
20 Jun 2023
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
CLIP
46
32
0
03 Oct 2022
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Samuel Thomas
Hilde Kuehne
...
Rameswar Panda
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
68
8
0
08 Nov 2021
1