Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

21 February 2019

Colin Cherry

Papers citing "Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling"

50 / 162 papers shown

SequenceLayers: Sequence Processing and Streaming Neural Networks Made Easy

...

31 Jul 2025

LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting

111

29 May 2025

GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword SpottingSpoken Language Technology Workshop (SLT), 2024

186

22 Oct 2024

Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments

194

23 Jul 2024

SimulTron: On-Device Simultaneous Speech to Speech Translation

Ye Jia

Michelle Tadmor Ramanovich

173

04 Jun 2024

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

...

Tsendsuren Munkhdalai

Angad Chandorkar

Rohit Prabhavalkar

302

15 Apr 2024

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

196

27 Feb 2024

Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

337

17 Jan 2024

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Guru Prakash Arumugam

209

18 Dec 2023

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

Shanqing Cai

Subhashini Venugopalan

...

244

03 Dec 2023

Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics

Valerio Francesco Puglisi

O. Giudice

Sebastiano Battiato

193

29 Oct 2023

Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey

241

26 Sep 2023

MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding MethodsInternational Conference on Learning Representations (ICLR), 2023

405

19 Sep 2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech RecognitionInterspeech (Interspeech), 2023

114

09 Jun 2023

Edit Distance based RL for RNNT decoding

DongSeon Hwang

Changwan Ryu

K. Sim

165

31 May 2023

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASRInterspeech (Interspeech), 2023

212

28 May 2023

Modular Domain Adaptation for Conformer-Based Streaming ASRInterspeech (Interspeech), 2023

190

22 May 2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023

Joshua Ainslie

...

217

11 Apr 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Jiatong Shi

...

222

10 Apr 2023

A Deliberation-based Joint Acoustic and Text DecoderInterspeech (Interspeech), 2021

133

23 Mar 2023

End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

282

243

03 Mar 2023

Defending against Adversarial Audio via Diffusion ModelInternational Conference on Learning Representations (ICLR), 2023

214

02 Mar 2023

Locale Encoding For Scalable Multilingual Keyword Spotting ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Pai Zhu

Hyun Jin Park

Alex Park

Angelo Scorza Scarpati

Ignacio López Moreno

171

25 Feb 2023

PyGlove: Efficiently Exchanging ML Ideas as Code

113

03 Feb 2023

Efficient Domain Adaptation for Speech Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

260

03 Feb 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

178

19 Jan 2023

Localising In-Domain Adaptation of Transformer-Based Biomedical Language ModelsJournal of Biomedical Informatics (JBI), 2022

141

20 Dec 2022

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

251

29 Nov 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

...

Jascha Narain Sohl-Dickstein

310

17 Nov 2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech SystemsSpoken Language Technology Workshop (SLT), 2022

125

01 Nov 2022

Textless Direct Speech-to-Speech Translation with Discrete Speech RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Xinjian Li

Ye Jia

Chung-Cheng Chiu

269

31 Oct 2022

Streaming Parrotron for on-device speech-to-speech conversionInterspeech (Interspeech), 2022

257

25 Oct 2022

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

Sanchit Gandhi

Patrick von Platen

Alexander M. Rush

141

24 Oct 2022

Scaling Up Deliberation for Multilingual ASRSpoken Language Technology Workshop (SLT), 2022

304

11 Oct 2022

A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice SeparationInterspeech (Interspeech), 2022

Tom O'Malley

A. Narayanan

Quan Wang

155

14 Sep 2022

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech RecognitionInterspeech (Interspeech), 2022

Kartik Audhkhasi

Yinghui Huang

Bhuvana Ramabhadran

Pedro J. Moreno

128

13 Sep 2022

A Language Agnostic Multilingual Streaming On-Device ASR SystemInterspeech (Interspeech), 2022

...

172

29 Aug 2022

RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial AttacksIEEE Signal Processing Letters (SPL), 2022

192

14 Jul 2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

...

641

1,359

22 Jun 2022

When does dough become a bagel? Analyzing the remaining mistakes on ImageNetNeural Information Processing Systems (NeurIPS), 2022

Vijay Vasudevan

Benjamin Caine

Raphael Gontijo-Lopes

Sara Fridovich-Keil

Rebecca Roelofs

VLM UQCV

197

09 May 2022

Building Machine Translation Systems for the Next Thousand Languages

...

299

108

09 May 2022

Online Model Compression for Federated Learning with Large ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

179

06 May 2022

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR AccuracyInterspeech (Interspeech), 2022

155

06 May 2022

CoCa: Contrastive Captioners are Image-Text Foundation Models

Mojtaba Seyedhosseini

Yonghui Wu

VLM CLIP OffRL

661

1,596

04 May 2022

The Implicit Length Bias of Label Smoothing on Beam Search Decoding

Bowen Liang

Pidong Wang

Yuan Cao

208

02 May 2022

Mask scalar prediction for improving robust automatic speech recognition

184

26 Apr 2022

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASRInterspeech (Interspeech), 2022

187

22 Apr 2022

$Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$$

Scaling Up Models and Data with

\texttt{t5x}

and

\texttt{seqio}

Journal of machine learning research (JMLR), 2022

...

289

213

31 Mar 2022

4-bit Conformer with Native Quantization Aware Training for Speech RecognitionInterspeech (Interspeech), 2022

371

29 Mar 2022

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translationInterspeech (Interspeech), 2022

Colin Cherry

222

24 Mar 2022