Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

27 April 2022

Papers citing "Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations"

14 / 14 papers shown

Title
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Xiaofeng Han Shunpeng Chen Zenghuang Fu Zhe Feng Lue Fan ... Li Guo Weiliang Meng Xiaopeng Zhang Rongtao Xu Shibiao Xu 66 1 0 03 Apr 2025
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering Bingxin Li 30 0 0 01 Apr 2025
VHASR: A Multimodal Speech Recognition System With Vision Hotwords Jiliang Hu Zuchao Li Ping Wang Haojun Ai Lefei Zhang Hai Zhao 16 1 0 01 Oct 2024
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition Yannis Vasilakis Rachel M. Bittner Johan Pauwels 40 0 0 25 Jul 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR Minghan Wang Yuxia Wang Thuy-Trang Vu Ehsan Shareghi Gholamreza Haffari 29 0 0 16 Jun 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition Jianyuan Ni Hao Tang Syed Tousiful Haque Yan Yan A. Ngu 71 6 0 14 Apr 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen Zhengcong Fei Mingyuan Fan Junshi Huang 25 17 0 27 Nov 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition Ziyi Ni Minglun Han Feilong Chen Linghui Meng Jing Shi Shuang Xu Bo Xu 37 0 0 31 May 2023
Visual Information Matters for ASR Error Correction Bannihati Kumar Vanya Shanbo Cheng Ningxin Peng Yuchen Zhang 24 3 0 16 Mar 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit Mina Huh Ruchira Ray Corey Karnei 19 3 0 27 Feb 2023
PMR: Prototypical Modal Rebalance for Multimodal Learning Yunfeng Fan Wenchao Xu Haozhao Wang Junxiao Wang Song Guo 23 60 0 14 Nov 2022
End-to-end Audio-visual Speech Recognition with Conformers Pingchuan Ma Stavros Petridis M. Pantic 79 224 0 12 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset Bolei Zhou Hang Zhao Xavier Puig Tete Xiao Sanja Fidler Adela Barriuso Antonio Torralba SSeg 253 1,827 0 18 Aug 2016
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 296 39,194 0 01 Sep 2014