Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.03052
Cited By
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech
9 April 2018
David Harwath
Galen Chuang
James R. Glass
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech"
14 / 14 papers shown
Title
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
30
3
0
21 Dec 2023
Visually grounded few-shot word acquisition with fewer shots
Leanne Nortje
Benjamin van Niekerk
Herman Kamper
30
1
0
25 May 2023
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
H. Ryu
Arda Senocak
In So Kweon
Joon Son Chung
VLM
30
8
0
30 Mar 2023
Towards visually prompted keyword localisation for zero-resource spoken languages
Leanne Nortje
Herman Kamper
29
6
0
12 Oct 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
354
0
21 May 2022
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David Harwath
SSL
43
26
0
07 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
32
7
0
02 Feb 2022
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
36
48
0
27 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
137
8
0
08 Nov 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer
Andrew Rouditchenko
Andrei Barbu
Boris Katz
James R. Glass
11
4
0
14 Oct 2021
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
36
29
0
07 Apr 2020
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito
William N. Havard
Mahault Garnerin
Éric Le Ferrand
Laurent Besacier
32
47
0
30 Jul 2019
Symbolic inductive bias for visually grounded learning of spoken language
Grzegorz Chrupała
27
28
0
21 Dec 2018
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
1