Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1702.01991
Cited By
Representations of language in a model of visually grounded speech signal
7 February 2017
Grzegorz Chrupała
Lieke Gelderloos
A. Alishahi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Representations of language in a model of visually grounded speech signal"
50 / 84 papers shown
Title
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer
Moritz Böhle
Gabriel de Marmiesse
Laurent Mazaré
Neil Zeghidour
Alexandre Défossez
P. Pérez
AuLLM
VLM
84
0
0
19 Mar 2025
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
Leanne Nortje
Dan Oneaţă
Herman Kamper
VLM
16
0
0
09 Sep 2024
A model of early word acquisition based on realistic-scale audiovisual naming events
Khazar Khorrami
Okko Rasanen
NAI
35
0
0
07 Jun 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
36
0
0
20 Mar 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
21
6
0
15 Sep 2023
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning
Saurabhchand Bhati
Jesús Villalba
Laureano Moro Velázquez
Thomas Thebaud
Najim Dehak
CLIP
25
3
0
08 Sep 2023
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
Khazar Khorrami
María Andrea Cruz Blandón
Tuomas Virtanen
Okko Rasanen
SSL
20
1
0
05 Jun 2023
Exploring How Generative Adversarial Networks Learn Phonological Representations
Jing Chen
Micha Elsner
GAN
11
3
0
21 May 2023
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
H. Ryu
Arda Senocak
In So Kweon
Joon Son Chung
VLM
19
8
0
30 Mar 2023
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
24
30
0
27 Oct 2022
YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding
Kayode Olaleye
Dan Oneaţă
Herman Kamper
ObjD
29
6
0
10 Oct 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
348
0
21 May 2022
A Computational Acquisition Model for Multimodal Word Categorization
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
11
9
0
12 May 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
G. Beguš
Alan Zhou
SSL
13
4
0
22 Mar 2022
Modelling word learning and recognition using visually grounded speech
Danny Merkx
Sebastiaan Scholten
S. Frank
M. Ernestus
O. Scharenborg
SSL
29
0
0
14 Mar 2022
Learning English with Peppa Pig
Mitja Nikolaus
A. Alishahi
Grzegorz Chrupała
16
13
0
25 Feb 2022
Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge
Danny Merkx
S. Frank
M. Ernestus
9
4
0
21 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David F. Harwath
SSL
28
26
0
07 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
19
7
0
02 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
16
2
0
18 Jan 2022
Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Gaoussou Youssouf Kebe
Luke E. Richards
Edward Raff
Francis Ferraro
Cynthia Matuszek
SSL
14
5
0
27 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Samuel Thomas
Hilde Kuehne
...
Rameswar Panda
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
65
8
0
08 Nov 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer
Andrew Rouditchenko
Andrei Barbu
Boris Katz
James R. Glass
11
4
0
14 Oct 2021
Voice-assisted Image Labelling for Endoscopic Ultrasound Classification using Neural Networks
E. Bonmati
Yipeng Hu
A. Grimwood
G. Johnson
G. Goodchild
...
K. Gurusamy
Brian P. Davidson
Matthew J. Clarkson
Stephen P. Pereira
D. Barratt
19
15
0
12 Oct 2021
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
Khazar Khorrami
Okko Rasanen
34
20
0
29 Sep 2021
Fast-Slow Transformer for Visually Grounding Speech
Puyuan Peng
David F. Harwath
18
30
0
16 Sep 2021
ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition
Afra Alishahia
Grzegorz Chrupała
Alejandrina Cristià
Emmanuel Dupoux
Bertrand Higy
Marvin Lavechin
Okko Rasanen
Chen Yu
32
7
0
14 Jul 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model
Ankita Pasad
Ju-Chieh Chou
Karen Livescu
SSL
26
287
0
10 Jul 2021
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
Khazar Khorrami
Okko Rasanen
18
9
0
05 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis
Shammur A. Chowdhury
Nadir Durrani
Ahmed M. Ali
25
12
0
01 Jul 2021
Attention-Based Keyword Localisation in Speech using Visual Grounding
Kayode Olaleye
Herman Kamper
6
13
0
16 Jun 2021
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
31
56
0
09 Jun 2021
Grounding 'Grounding' in NLP
Khyathi Raghavi Chandu
Yonatan Bisk
A. Black
22
51
0
04 Jun 2021
Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Ramon Sanabria
Austin Waters
Jason Baldridge
3DV
6
25
0
05 Apr 2021
Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery
Yasuaki Okuda
Ryo Ozaki
T. Taniguchi
18
5
0
15 Mar 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David F. Harwath
Christopher Song
James R. Glass
CLIP
27
66
0
31 Dec 2020
Towards localisation of keywords in speech using weak supervision
Kayode Olaleye
Benjamin van Niekerk
Herman Kamper
11
5
0
14 Dec 2020
Direct multimodal few-shot learning of speech and images
Leanne Nortje
Herman Kamper
SSL
6
10
0
10 Dec 2020
Probing Multilingual BERT for Genetic and Typological Signals
Taraka Rama
Lisa Beinborn
Steffen Eger
6
24
0
04 Nov 2020
Similarity Analysis of Self-Supervised Speech Representations
Yu-An Chung
Yonatan Belinkov
James R. Glass
SSL
28
36
0
22 Oct 2020
Textual Supervision for Visually Grounded Spoken Language Understanding
Bertrand Higy
Desmond Eliott
Grzegorz Chrupała
10
10
0
06 Oct 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech
William N. Havard
Jean-Pierre Chevrot
Laurent Besacier
15
10
0
15 Jun 2020
Learning to Recognise Words using Visually Grounded Speech
Sebastiaan Scholten
Danny Merkx
O. Scharenborg
17
13
0
31 May 2020
Learning to Understand Child-directed and Adult-directed Speech
Lieke Gelderloos
Grzegorz Chrupała
A. Alishahi
9
4
0
06 May 2020
Analyzing analytical methods: The case of phonology in neural models of spoken language
Grzegorz Chrupała
Bertrand Higy
A. Alishahi
8
20
0
15 Apr 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David F. Harwath
Wei-Ning Hsu
James R. Glass
15
84
0
21 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
16
147
0
10 Nov 2019
Training ASR models by Generation of Contextual Information
Kritika Singh
Dmytro Okhonko
Jun Liu
Yongqiang Wang
Frank Zhang
...
Sergey Edunov
Fuchun Peng
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
20
7
0
27 Oct 2019
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
6
60
0
19 Sep 2019
1
2
Next