Syllable Discovery and Cross-Lingual Generalization in a Visually
Grounded, Self-Supervised Speech Model

Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

19 May 2023

Abdel-rahman Mohamed

David F. Harwath

Papers citing "Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model"

12 / 12 papers shown

Title
Sylber: Syllabic Embedding Representation of Speech from Raw Audio Cheol Jun Cho Nicholas Lee Akshat Gupta Dhruv Agarwal Ethan Chen Alan W Black Gopala K. Anumanchipalli 32 0 0 09 Oct 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT Ryota Komatsu Takahiro Shinozaki SSL 32 1 0 16 Sep 2024
A model of early word acquisition based on realistic-scale audiovisual naming events Khazar Khorrami Okko Rasanen NAI 35 0 0 07 Jun 2024
Visually Grounded Speech Models have a Mutual Exclusivity Bias Leanne Nortje Dan Oneaţă Yevgen Matusevych Herman Kamper SSL 36 0 0 20 Mar 2024
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words Robin Algayres Pablo Diego-Simon Benoît Sagot Emmanuel Dupoux 28 1 0 08 Oct 2023
Visually grounded few-shot word learning in low-resource settings Leanne Nortje Dan Oneaţă Herman Kamper VLM 15 4 0 20 Jun 2023
Putting Natural in Natural Language Processing Grzegorz Chrupała 25 9 0 08 May 2023
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Yi-Jen Shih Hsuan-Fu Wang Heng-Jui Chang Layne Berry Hung-yi Lee David F. Harwath VLM CLIP 43 32 0 03 Oct 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 124 348 0 21 May 2022
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation Khazar Khorrami Okko Rasanen 34 20 0 29 Sep 2021
Generative Spoken Language Modeling from Raw Audio Kushal Lakhotia Evgeny Kharitonov Wei-Ning Hsu Yossi Adi Adam Polyak ... Tu Nguyen Jade Copet Alexei Baevski A. Mohamed Emmanuel Dupoux AuLLM 174 336 0 01 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 282 39,190 0 01 Sep 2014