Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.05084
Cited By
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
8 May 2023
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
He Juang
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition"
50 / 52 papers shown
Title
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Haoshuai Zhou
Boxuan Cao
Changgeng Mo
Linkai Li
Shan Xiang Wang
AI4CE
19
0
0
13 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
43
0
0
05 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
19
0
0
30 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
23
0
0
09 Apr 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna C. Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
41
0
0
07 Mar 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
32
0
0
27 Feb 2025
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Anurag Kumar
Rohit Paturi
Amber Afshan
S. Srinivasan
41
0
0
14 Jan 2025
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
Sourav Banerjee
Ayushi Agarwal
Promila Ghosh
76
2
0
24 Nov 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
16
0
0
18 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge
Ya Jiang
Hongbo Lan
Jun Du
Qing Wang
Shutong Niu
32
1
0
08 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
81
0
0
03 Oct 2024
Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Hao Yen
Shaoshi Ling
Guoli Ye
21
0
0
29 Sep 2024
Target word activity detector: An approach to obtain ASR word boundaries without lexicon
S. Sivasankaran
Eric Sun
Jinyu Li
Yan-ping Huang
Jing Pan
18
0
0
20 Sep 2024
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
Iuliia Thorbecke
Juan Zuluaga-Gomez
Esaú Villatoro-Tello
Shashi Kumar
Pradeep Rangappa
Sergio Burdisso
P. Motlícek
Karthik Pandia
A. Ganapathiraju
21
0
0
20 Sep 2024
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
Jinhan Wang
Weiqing Wang
Kunal Dhawan
Taejin Park
Myungjong Kim
Ivan Medennikov
He Huang
Nithin Koluguri
Jagadeesh Balam
Boris Ginsburg
33
0
0
18 Sep 2024
ASR Benchmarking: Need for a More Representative Conversational Dataset
Gaurav Maheshwari
Dmitry Ivanov
Théo Johannet
Kevin El Haddad
11
0
0
18 Sep 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
30
2
0
10 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
27
2
0
09 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna C. Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
28
1
0
02 Sep 2024
Speaker Tagging Correction With Non-Autoregressive Language Models
Grigor Kirakosyan
Davit Karamyan
3DV
21
0
0
30 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna C. Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
23
1
0
23 Aug 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
28
1
0
18 Jul 2024
Romanization Encoding For Multilingual ASR
Wen Ding
Fei Jia
Hainan Xu
Yu Xi
Junjie Lai
Boris Ginsburg
21
0
0
05 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
39
1
0
03 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
34
10
0
28 Jun 2024
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
Rohit Paturi
Xiang Li
S. Srinivasan
21
1
0
25 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
34
5
0
21 Jun 2024
Self-Train Before You Transcribe
Robert Flynn
Anton Ragni
16
0
0
17 Jun 2024
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
31
7
0
15 Jun 2024
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
A. Andrusenko
A. Laptev
Vladimir Bataev
Vitaly Lavrukhin
Boris Ginsburg
35
0
0
11 Jun 2024
Label-Looping: Highly Efficient Decoding for Transducers
Vladimir Bataev
Hainan Xu
Daniel Galvez
Vitaly Lavrukhin
Boris Ginsburg
25
4
0
10 Jun 2024
To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation
Abdul Waheed
Karima Kadaoui
Muhammad Abdul-Mageed
VLM
33
3
0
06 Jun 2024
Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU
Daniel Galvez
Vladimir Bataev
Hainan Xu
Tim Kaldewey
23
5
0
06 Jun 2024
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu
Zhehuai Chen
Fei Jia
Boris Ginsburg
30
0
0
04 Apr 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
33
7
0
14 Mar 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
35
17
0
20 Feb 2024
Self-consistent context aware conformer transducer for speech recognition
Konstantin Kolokolov
Pavel Pekichev
Karthik Raghunathan
11
0
0
09 Feb 2024
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
22
14
0
22 Jan 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
8
10
0
27 Dec 2023
Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition
R. N. Nandi
Mehadi Hasan Menon
Tareq Al Muntasir
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Tariqul Islam
Shammur A. Chowdhury
Firoj Alam
11
3
0
06 Nov 2023
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn
Anton Ragni
17
1
0
24 Oct 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
Peter Polák
3DV
27
3
0
17 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
21
48
0
13 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
28
0
0
29 Sep 2023
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard
Salah Zaiem
Robin Algayres
23
2
0
22 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
19
12
0
19 Sep 2023
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri
Samuel Kriman
Georgy Zelenfroind
Somshubra Majumdar
Dima Rekesh
Vahid Noroozi
Jagadeesh Balam
Boris Ginsburg
AuLLM
13
9
0
18 Sep 2023
Earnings-21: A Practical Benchmark for ASR in the Wild
Miguel Rio
Natalie Delworth
Ryan Westerman
Michelle Huang
Nishchal Bhandari
Joseph Palakapilly
Quinten McNamara
Joshua Dong
Piotr Żelasko
Miguel Jetté
58
47
0
22 Apr 2021
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLM
SSL
136
307
0
20 Oct 2020
1
2
Next