ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.13206
  4. Cited By
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

27 April 2022
Dan Oneaţă
H. Cucu
ArXivPDFHTML

Papers citing "Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations"

14 / 14 papers shown
Title
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
66
1
0
03 Apr 2025
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
Bingxin Li
30
0
0
01 Apr 2025
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
Jiliang Hu
Zuchao Li
Ping Wang
Haojun Ai
Lefei Zhang
Hai Zhao
16
1
0
01 Oct 2024
I can listen but cannot read: An evaluation of two-tower multimodal
  systems for instrument recognition
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
40
0
0
25 Jul 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive
  Multimodal ASR
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
Minghan Wang
Yuxia Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
29
0
0
16 Jun 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Jianyuan Ni
Hao Tang
Syed Tousiful Haque
Yan Yan
A. Ngu
71
6
0
14 Apr 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
37
0
0
31 May 2023
Visual Information Matters for ASR Error Correction
Visual Information Matters for ASR Error Correction
Bannihati Kumar Vanya
Shanbo Cheng
Ningxin Peng
Yuchen Zhang
24
3
0
16 Mar 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh
Ruchira Ray
Corey Karnei
19
3
0
27 Feb 2023
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Haozhao Wang
Junxiao Wang
Song Guo
23
60
0
14 Nov 2022
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
224
0
12 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,827
0
18 Aug 2016
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1