ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.01180
  4. Cited By
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
  Multilingual Speech to Image Retrieval

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

2 November 2022
Layne Berry
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Hung-yi Lee
David F. Harwath
    VLM
ArXivPDFHTML

Papers citing "M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval"

9 / 9 papers shown
Title
Interface Design for Self-Supervised Speech Models
Interface Design for Self-Supervised Speech Models
Yi-Jen Shih
David Harwath
54
1
0
18 Jun 2024
Translating speech with just images
Translating speech with just images
Dan Oneaţă
Herman Kamper
VLM
23
1
0
11 Jun 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
39
0
0
20 Mar 2024
SpeechCLIP+: Self-supervised multi-task representation learning for
  speech via CLIP and speech-image data
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang
Yi-Jen Shih
Heng-Jui Chang
Layne Berry
Puyuan Peng
Hung-yi Lee
Hsin-Min Wang
David F. Harwath
VLM
38
2
0
10 Feb 2024
Integrating Self-supervised Speech Model with Pseudo Word-level Targets
  from Visually-grounded Speech Model
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
29
1
0
08 Feb 2024
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic
  Spaces
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
13
1
0
23 Jul 2023
Visually grounded few-shot word learning in low-resource settings
Visually grounded few-shot word learning in low-resource settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
15
4
0
20 Jun 2023
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
CLIP
46
32
0
03 Oct 2022
Cascaded Multilingual Audio-Visual Learning from Videos
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Samuel Thomas
Hilde Kuehne
...
Rameswar Panda
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
68
8
0
08 Nov 2021
1