ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.09424
  4. Cited By
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

13 October 2023
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
    LRM
ArXivPDFHTML

Papers citing "SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation"

41 / 41 papers shown
Title
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
36
1
0
11 Apr 2025
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
Xinglin Lyu
Wei Tang
Y. Li
X. Zhao
Ming Zhu
...
Y. Lu
Min Zhang
Daimeng Wei
Hao Yang
Min Zhang
67
0
0
07 Apr 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
J. Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Y. Wang
Chao Zhang
AuLLM
74
0
0
26 Mar 2025
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
Synthetic data enables context-aware bioacoustic sound event detection
Benjamin Hoffman
David Robinson
Marius Miron
V. Baglione
D. Canestrari
Damian Elias
Eva Trapote
Olivier Pietquin
32
0
0
01 Mar 2025
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
Weiqiao Shan
Y. Li
Yuhao Zhang
Yingfeng Luo
Chen Xu
...
Y. Lu
M. Zhang
Hao Yang
Tong Xiao
Jingbo Zhu
AuLLM
60
0
0
24 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
1
0
28 Jan 2025
Speech Translation Refinement using Large Language Models
Huaixia Dou
Xinyu Tian
Xinglin Lyu
Jie Zhu
Junhui Li
Lifan Guo
47
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
56
4
0
24 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
29
0
0
04 Jan 2025
Optimizing Contextual Speech Recognition Using Vector Quantization for
  Efficient Retrieval
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos
Roger Hsiao
P. Swietojanski
Takaaki Hori
Dogan Can
Xiaodan Zhuang
37
0
0
01 Nov 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLM
ELM
65
19
0
24 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
43
2
0
23 Oct 2024
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal
  Chain-of-Thought
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Yexing Du
Ziyang Ma
Yifan Yang
Keqi Deng
Xie Chen
Bo Yang
Yang Xiang
Ming Liu
Bing Qin
LRM
21
6
0
29 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
65
5
0
25 Sep 2024
EMMeTT: Efficient Multimodal Machine Translation Training
EMMeTT: Efficient Multimodal Machine Translation Training
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
20
1
0
20 Sep 2024
Chain-of-Thought Prompting for Speech Translation
Chain-of-Thought Prompting for Speech Translation
Ke Hu
Zhehuai Chen
Chao-Han Huck Yang
Piotr Żelasko
Oleksii Hrinchuk
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
LRM
26
2
0
17 Sep 2024
Optimizing Rare Word Accuracy in Direct Speech Translation with a
  Retrieval-and-Demonstration Approach
Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach
Siqi Li
Danni Liu
Jan Niehues
21
0
0
13 Sep 2024
Contextualization of ASR with LLM using phonetic retrieval-based
  augmentation
Contextualization of ASR with LLM using phonetic retrieval-based augmentation
Zhihong Lei
Xingyu Na
Mingbin Xu
Ernest Pusateri
Christophe Van Gysel
Yuanyuan Zhang
Shiyi Han
Zhen Huang
18
2
0
11 Sep 2024
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Yaoxun Xu
Shi-Xiong Zhang
Jianwei Yu
Zhiyong Wu
Dong Yu
AuLLM
14
3
0
01 Sep 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
29
23
0
05 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
37
1
0
01 Aug 2024
Improving Neural Biasing for Contextual Speech Recognition by Early
  Context Injection and Text Perturbation
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Ruizhe Huang
M. Yarmohammadi
Sanjeev Khudanpur
Dan Povey
23
2
0
14 Jul 2024
A Survey on LoRA of Large Language Models
A Survey on LoRA of Large Language Models
Yuren Mao
Yuhang Ge
Yijiang Fan
Wenyi Xu
Yu Mi
Zhonghao Hu
Yunjun Gao
ALM
52
22
0
08 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
36
19
0
05 Jul 2024
Investigating Decoder-only Large Language Models for Speech-to-text
  Translation
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Chao-Wei Huang
Hui Lu
Hongyu Gong
H. Inaguma
Ilia Kulikov
Ruslan Mavlyutov
Sravya Popuri
AuLLM
LRM
42
6
0
03 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
34
10
0
28 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech
  Language Models
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
29
3
0
18 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
36
15
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech
  Units for Spoken Language Understanding
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
39
2
0
13 Jun 2024
Can Large Language Models Understand Spatial Audio?
Can Large Language Models Understand Spatial Audio?
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
...
Jun Zhang
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
44
4
0
12 Jun 2024
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based
  Word Spotter
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
A. Andrusenko
A. Laptev
Vladimir Bataev
Vitaly Lavrukhin
Boris Ginsburg
27
0
0
11 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
33
2
0
09 Jun 2024
Audio Dialogues: Dialogues dataset for audio and music understanding
Audio Dialogues: Dialogues dataset for audio and music understanding
Arushi Goel
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
AuLLM
24
4
0
11 Apr 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
38
11
0
19 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Wei Ping
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
59
73
0
02 Feb 2024
Using Large Language Model for End-to-End Chinese ASR and NER
Using Large Language Model for End-to-End Chinese ASR and NER
Yuang Li
Jiawei Yu
Min Zhang
Mengxin Ren
Yanqing Zhao
Xiaofeng Zhao
Miaomiao Ma
Chang Su
Hao-Yu Yang
23
2
0
21 Jan 2024
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks
  with Natural Language Instructions
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
Siddhant Arora
Hayato Futami
Jee-weon Jung
Yifan Peng
Roshan S. Sharma
Yosuke Kashiwagi
E. Tsunoo
Karen Livescu
Shinji Watanabe
ELM
14
7
0
04 Oct 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
77
249
0
02 Mar 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
NeMo: a toolkit for building AI applications using Neural Modules
NeMo: a toolkit for building AI applications using Neural Modules
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
...
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
174
287
0
14 Sep 2019
1