Representations of language in a model of visually grounded speech signal

7 February 2017

Papers citing "Representations of language in a model of visually grounded speech signal"

50 / 84 papers shown

Title
Vision-Speech Models: Teaching Speech Models to Converse about Images Amélie Royer Moritz Böhle Gabriel de Marmiesse Laurent Mazaré Neil Zeghidour Alexandre Défossez P. Pérez AuLLM VLM 84 0 0 19 Mar 2025
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings Leanne Nortje Dan Oneaţă Herman Kamper VLM 16 0 0 09 Sep 2024
A model of early word acquisition based on realistic-scale audiovisual naming events Khazar Khorrami Okko Rasanen NAI 35 0 0 07 Jun 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias Leanne Nortje Dan Oneaţă Yevgen Matusevych Herman Kamper SSL 36 0 0 20 Mar 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens Minsu Kim J. Choi Soumi Maiti Jeong Hun Yeo Shinji Watanabe Y. Ro VLM 21 6 0 15 Sep 2023
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning Saurabhchand Bhati Jesús Villalba Laureano Moro Velázquez Thomas Thebaud Najim Dehak CLIP 25 3 0 08 Sep 2023
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System Khazar Khorrami María Andrea Cruz Blandón Tuomas Virtanen Okko Rasanen SSL 20 1 0 05 Jun 2023
Exploring How Generative Adversarial Networks Learn Phonological Representations Jing Chen Micha Elsner GAN 11 3 0 21 May 2023
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples H. Ryu Arda Senocak In So Kweon Joon Son Chung VLM 19 8 0 30 Mar 2023
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge Ewan Dunbar Nicolas Hamilakis Emmanuel Dupoux SSL 24 30 0 27 Oct 2022
YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding Kayode Olaleye Dan Oneaţă Herman Kamper ObjD 29 6 0 10 Oct 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 124 348 0 21 May 2022
A Computational Acquisition Model for Multimodal Word Categorization Uri Berger Gabriel Stanovsky Omri Abend Lea Frermann 11 9 0 12 May 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data G. Beguš Alan Zhou SSL 13 4 0 22 Mar 2022
Modelling word learning and recognition using visually grounded speech Danny Merkx Sebastiaan Scholten S. Frank M. Ernestus O. Scharenborg SSL 29 0 0 14 Mar 2022
Learning English with Peppa Pig Mitja Nikolaus A. Alishahi Grzegorz Chrupała 16 13 0 25 Feb 2022
Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge Danny Merkx S. Frank M. Ernestus 9 4 0 21 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling Puyuan Peng David F. Harwath SSL 28 26 0 07 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models Kayode Olaleye Dan Oneaţă Herman Kamper 19 7 0 02 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues Akira Taniguchi Hiroaki Murakami Ryo Ozaki T. Taniguchi 16 2 0 18 Jan 2022
Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech Gaoussou Youssouf Kebe Luke E. Richards Edward Raff Francis Ferraro Cynthia Matuszek SSL 14 5 0 27 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos Andrew Rouditchenko Angie Boggust David F. Harwath Samuel Thomas Hilde Kuehne ... Rameswar Panda Rogerio Feris Brian Kingsbury M. Picheny James R. Glass 65 8 0 08 Nov 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset Ian Palmer Andrew Rouditchenko Andrei Barbu Boris Katz James R. Glass 11 4 0 14 Oct 2021
Voice-assisted Image Labelling for Endoscopic Ultrasound Classification using Neural Networks E. Bonmati Yipeng Hu A. Grimwood G. Johnson G. Goodchild ... K. Gurusamy Brian P. Davidson Matthew J. Clarkson Stephen P. Pereira D. Barratt 19 15 0 12 Oct 2021
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation Khazar Khorrami Okko Rasanen 34 20 0 29 Sep 2021
Fast-Slow Transformer for Visually Grounding Speech Puyuan Peng David F. Harwath 18 30 0 16 Sep 2021
ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition Afra Alishahia Grzegorz Chrupała Alejandrina Cristià Emmanuel Dupoux Bertrand Higy Marvin Lavechin Okko Rasanen Chen Yu 32 7 0 14 Jul 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model Ankita Pasad Ju-Chieh Chou Karen Livescu SSL 26 287 0 10 Jul 2021
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models Khazar Khorrami Okko Rasanen 18 9 0 05 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis Shammur A. Chowdhury Nadir Durrani Ahmed M. Ali 25 12 0 01 Jul 2021
Attention-Based Keyword Localisation in Speech using Visual Grounding Kayode Olaleye Herman Kamper 6 13 0 16 Jun 2021
Unsupervised Automatic Speech Recognition: A Review Hanan Aldarmaki Asad Ullah Nazar Zaki VLM SSL 31 56 0 09 Jun 2021
Grounding 'Grounding' in NLP Khyathi Raghavi Chandu Yonatan Bisk A. Black 22 51 0 04 Jun 2021
Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval Ramon Sanabria Austin Waters Jason Baldridge 3DV 6 25 0 05 Apr 2021
Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery Yasuaki Okuda Ryo Ozaki T. Taniguchi 18 5 0 15 Mar 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David F. Harwath Christopher Song James R. Glass CLIP 27 66 0 31 Dec 2020
Towards localisation of keywords in speech using weak supervision Kayode Olaleye Benjamin van Niekerk Herman Kamper 11 5 0 14 Dec 2020
Direct multimodal few-shot learning of speech and images Leanne Nortje Herman Kamper SSL 6 10 0 10 Dec 2020
Probing Multilingual BERT for Genetic and Typological Signals Taraka Rama Lisa Beinborn Steffen Eger 6 24 0 04 Nov 2020
Similarity Analysis of Self-Supervised Speech Representations Yu-An Chung Yonatan Belinkov James R. Glass SSL 28 36 0 22 Oct 2020
Textual Supervision for Visually Grounded Spoken Language Understanding Bertrand Higy Desmond Eliott Grzegorz Chrupała 10 10 0 06 Oct 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David F. Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 22 141 0 16 Jun 2020
Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech William N. Havard Jean-Pierre Chevrot Laurent Besacier 15 10 0 15 Jun 2020
Learning to Recognise Words using Visually Grounded Speech Sebastiaan Scholten Danny Merkx O. Scharenborg 17 13 0 31 May 2020
Learning to Understand Child-directed and Adult-directed Speech Lieke Gelderloos Grzegorz Chrupała A. Alishahi 9 4 0 06 May 2020
Analyzing analytical methods: The case of phonology in neural models of spoken language Grzegorz Chrupała Bertrand Higy A. Alishahi 8 20 0 15 Apr 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech David F. Harwath Wei-Ning Hsu James R. Glass 15 84 0 21 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition Alexei Baevski Michael Auli Abdel-rahman Mohamed SSL 16 147 0 10 Nov 2019
Training ASR models by Generation of Contextual Information Kritika Singh Dmytro Okhonko Jun Liu Yongqiang Wang Frank Zhang ... Sergey Edunov Fuchun Peng Yatharth Saraf Geoffrey Zweig Abdel-rahman Mohamed 20 7 0 27 Oct 2019
Large-scale representation learning from visually grounded untranscribed speech Gabriel Ilharco Yuan Zhang Jason Baldridge SSL 6 60 0 19 Sep 2019