Textual Supervision for Visually Grounded Spoken Language Understanding

v1v2 (latest)

Textual Supervision for Visually Grounded Spoken Language Understanding

6 October 2020

Grzegorz Chrupała

ArXiv (abs)PDF HTML

Papers citing "Textual Supervision for Visually Grounded Spoken Language Understanding"

8 / 8 papers shown

Title
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu Lakshay Sethi 136 0 0 19 Aug 2024
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 H. Ryu Arda Senocak In So Kweon Joon Son Chung VLM 156 10 0 30 Mar 2023
Evaluating context-invariance in unsupervised speech representationsInterspeech (Interspeech), 2022 Mark Hallap Emmanuel Dupoux Ewan Dunbar SSL 166 13 0 27 Oct 2022
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling Puyuan Peng David Harwath SSL 143 28 0 07 Feb 2022
Fast-Slow Transformer for Visually Grounding Speech Puyuan Peng David Harwath 206 34 0 16 Sep 2021
ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition Afra Alishahia Grzegorz Chrupała Alejandrina Cristià Emmanuel Dupoux Bertrand Higy Marvin Lavechin Okko Räsänen Chen Yu 101 7 0 14 Jul 2021
Discrete representations in neural models of spoken languageBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2021 Bertrand Higy Lieke Gelderloos Afra Alishahi Grzegorz Chrupała 184 6 0 12 May 2021
Talk, Don't Write: A Study of Direct Speech-Based Image RetrievalInterspeech (Interspeech), 2021 Ramon Sanabria Austin Waters Jason Baldridge 3DV 140 27 0 05 Apr 2021