ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05457
  4. Cited By
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

8 February 2024
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
ArXivPDFHTML

Papers citing "It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition"

21 / 21 papers shown
Title
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
H. Li
35
0
0
03 Jan 2025
Effective Text Adaptation for LLM-based ASR through Soft Prompt
  Fine-Tuning
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
Yingyi Ma
Zhe Liu
Ozlem Kalinli
65
0
0
09 Dec 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
28
0
0
11 Nov 2024
Optimizing Contextual Speech Recognition Using Vector Quantization for
  Efficient Retrieval
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
37
0
0
01 Nov 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
18
9
0
18 Sep 2024
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented
  Generation
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
Shaojun Li
Hengchao Shang
Daimeng Wei
Jiaxin Guo
Zongyao Li
Xianghui He
Min Zhang
Hao Yang
19
2
0
13 Sep 2024
SALSA: Speedy ASR-LLM Synchronous Aggregation
SALSA: Speedy ASR-LLM Synchronous Aggregation
Ashish R. Mittal
Darshan Prabhu
Sunita Sarawagi
P. Jyothi
21
2
0
29 Aug 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech
  Units: A Pilot Study
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
29
2
0
27 Jun 2024
Large Language Models for Dysfluency Detection in Stuttered Speech
Large Language Models for Dysfluency Detection in Stuttered Speech
Dominik Wagner
Sebastian P. Bayerl
Ilja Baumann
K. Riedhammer
Elmar Nöth
Tobias Bocklet
30
3
0
16 Jun 2024
Soundscape Captioning using Sound Affective Quality Network and Large
  Language Model
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
Yuanbo Hou
Qiaoqiao Ren
A. Mitchell
Wenwu Wang
Jian Kang
Tony Belpaeme
Dick Botteldooren
26
3
0
09 Jun 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
21
6
0
26 May 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction
  with LLM for Joint Accent and Speech Recognition
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Bingshen Mu
Yangze Li
Qijie Shao
Kun Wei
Xucheng Wan
Naijun Zheng
Huan Zhou
Lei Xie
27
5
0
06 May 2024
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Taeheon Kim
Sangyun Chung
Damin Yeom
Youngjoon Yu
Hak Gu Kim
Y. Ro
25
2
0
22 Mar 2024
Multi-stage Large Language Model Correction for Speech Recognition
Multi-stage Large Language Model Correction for Speech Recognition
Jie Pu
Thai-Son Nguyen
Sebastian Stüker
LRM
17
6
0
17 Oct 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
  for Speech Recognition
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
S. Radhakrishnan
Chao-Han Huck Yang
S. Khan
Rohit Kumar
N. Kiani
D. Gómez-Cabrero
Jesper N. Tegnér
35
47
0
10 Oct 2023
Caption Anything: Interactive Image Description with Diverse Multimodal
  Controls
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
96
81
0
04 May 2023
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
Chenzhuang Du
Jiaye Teng
Tingle Li
Yichen Liu
Tianyuan Yuan
Yue Wang
Yang Yuan
Hang Zhao
33
38
0
02 May 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
E. Chng
10
15
0
11 Apr 2023
UMIX: Improving Importance Weighting for Subpopulation Shift via
  Uncertainty-Aware Mixup
UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup
Zongbo Han
Zhipeng Liang
Fan Yang
Liu Liu
Lanqing Li
Yatao Bian
P. Zhao
Bing Wu
Changqing Zhang
Jianhua Yao
45
34
0
19 Sep 2022
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
Liyan Xu
Yile Gu
J. Kolehmainen
Haidar Khan
Ankur Gandhe
Ariya Rastrow
A. Stolcke
I. Bulyko
23
45
0
02 Feb 2022
Dropout as a Bayesian Approximation: Representing Model Uncertainty in
  Deep Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
245
9,042
0
06 Jun 2015
1