Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
1811.03875
Cited By

Multimodal One-Shot Learning of Speech and Images

v1v2 (latest)

Multimodal One-Shot Learning of Speech and Images

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018

9 November 2018

ArXiv (abs)PDF HTML

Papers citing "Multimodal One-Shot Learning of Speech and Images"

13 / 13 papers shown

Towards visually prompted keyword localisation for zero-resource spoken
languages

Towards visually prompted keyword localisation for zero-resource spoken languagesSpoken Language Technology Workshop (SLT), 2022

151

6

0

12 Oct 2022

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword
localisation through visual grounding

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual groundingSpoken Language Technology Workshop (SLT), 2022

209

8

0

10 Oct 2022

Meta Learning for Natural Language Processing: A Survey

Meta Learning for Natural Language Processing: A SurveyNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

343

51

0

03 May 2022

Keyword localisation in untranscribed speech using visually grounded
speech models

Keyword localisation in untranscribed speech using visually grounded speech modelsIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

196

7

0

02 Feb 2022

Multimodality in Meta-Learning: A Comprehensive Survey

Multimodality in Meta-Learning: A Comprehensive Survey

Irwin King

253

71

0

28 Sep 2021

HetMAML: Task-Heterogeneous Model-Agnostic Meta-Learning for Few-Shot
Learning Across Modalities

HetMAML: Task-Heterogeneous Model-Agnostic Meta-Learning for Few-Shot Learning Across ModalitiesInternational Conference on Information and Knowledge Management (CIKM), 2021

Aidong Zhang

162

16

0

17 May 2021

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

Text-Free Image-to-Speech Synthesis Using Learned Segmental UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Christopher Song

186

74

0

31 Dec 2020

Direct multimodal few-shot learning of speech and images

Direct multimodal few-shot learning of speech and imagesInterspeech (Interspeech), 2020

303

10

0

10 Dec 2020

A Survey on Machine Learning from Few Samples

A Survey on Machine Learning from Few SamplesPattern Recognition (Pattern Recognit.), 2020

325

78

0

06 Sep 2020

Unsupervised vs. transfer learning for multimodal one-shot matching of
speech and images

Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images

129

9

0

14 Aug 2020

AVLnet: Learning Audio-Visual Language Representations from
Instructional Videos

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Andrew Rouditchenko

...

Brian Kingsbury

Antonio Torralba

248

142

0

16 Jun 2020

Deep Neural Networks for Automatic Speech Processing: A Survey from
Large Corpora to Limited Data

Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited DataEURASIP Journal on Audio, Speech, and Music Processing (JEASMP), 2020

Jérôme Farinas

115

31

0

09 Mar 2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded
Speech

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded SpeechInternational Conference on Learning Representations (ICLR), 2019

177

88

0

21 Nov 2019