Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1902.03052
Cited By
Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese
8 February 2019
William N. Havard
Jean-Pierre Chevrot
Laurent Besacier
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese"
13 / 13 papers shown
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
Interspeech (Interspeech), 2023
Puyuan Peng
Shang-Wen Li
Okko Räsänen
Abdel-rahman Mohamed
David Harwath
SSL
VLM
335
11
0
19 May 2023
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Layne Berry
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Hung-yi Lee
David Harwath
VLM
256
11
0
02 Nov 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
796
475
0
21 May 2022
Learning English with Peppa Pig
Transactions of the Association for Computational Linguistics (TACL), 2022
Mitja Nikolaus
Afra Alishahi
Grzegorz Chrupała
262
16
0
25 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Kayode Olaleye
Dan Oneaţă
Herman Kamper
260
7
0
02 Feb 2022
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
618
8
0
08 Nov 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer
Andrew Rouditchenko
Andrei Barbu
Boris Katz
James R. Glass
179
4
0
14 Oct 2021
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
Khazar Khorrami
Okko Räsänen
298
24
0
29 Sep 2021
ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition
Afra Alishahia
Grzegorz Chrupała
Alejandrina Cristià
Emmanuel Dupoux
Bertrand Higy
Marvin Lavechin
Okko Räsänen
Chen Yu
222
7
0
14 Jul 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
227
74
0
31 Dec 2020
Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech
William N. Havard
Jean-Pierre Chevrot
Laurent Besacier
242
11
0
15 Jun 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
International Conference on Learning Representations (ICLR), 2019
David Harwath
Wei-Ning Hsu
James R. Glass
284
88
0
21 Nov 2019
Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech
Conference on Computational Natural Language Learning (CoNLL), 2019
William N. Havard
Jean-Pierre Chevrot
Laurent Besacier
153
23
0
18 Sep 2019
1
Page 1 of 1