ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.08642
  4. Cited By
Multimodal Speech Recognition with Unstructured Audio Masking

Multimodal Speech Recognition with Unstructured Audio Masking

16 October 2020
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
    CVBM
ArXiv (abs)PDFHTML

Papers citing "Multimodal Speech Recognition with Unstructured Audio Masking"

8 / 8 papers shown
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
VHASR: A Multimodal Speech Recognition System With Vision HotwordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jiliang Hu
Zuchao Li
Ping Wang
Haojun Ai
Lefei Zhang
Hai Zhao
186
3
0
01 Oct 2024
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
186
3
0
31 May 2023
Multimodal Speech Recognition for Language-Guided Embodied Agents
Multimodal Speech Recognition for Language-Guided Embodied AgentsInterspeech (Interspeech), 2023
Allen Chang
Xiaoyuan Zhu
Aarav Monga
Seoho Ahn
Tejas Srinivasan
Jesse Thomason
AuLLM
343
6
0
27 Feb 2023
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
118
24
0
27 Apr 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical
  Applications: A Survey
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Imran Razzak
Kevin Lee
Chetan Arora
Ali Hassani
A. Zaslavsky
AAML
178
8
0
22 Feb 2022
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment AnalysisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Georgios Paraskevopoulos
Efthymios Georgiou
Alexandros Potamianos
123
37
0
24 Jan 2022
Text is no more Enough! A Benchmark for Profile-based Spoken Language
  Understanding
Text is no more Enough! A Benchmark for Profile-based Spoken Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2021
Xiao Xu
Libo Qin
Kaiji Chen
Guoxing Wu
Linlin Li
Wanxiang Che
215
9
0
22 Dec 2021
Detecting Hate Speech in Memes Using Multimodal Deep Learning
  Approaches: Prize-winning solution to Hateful Memes Challenge
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu
J. Rose
VLM
121
103
0
23 Dec 2020
1