Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 454 papers shown

Title
SAGE: Smart home Agent with Grounded Execution D. Rivkin F. Hogan Amal Feriani Abhisek Konar Adam Sigal Steve Liu Gregory Dudek LM&Ro LLMAG ELM LRM 18 3 0 01 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation Juan Pablo Zuluaga Zhaocheng Huang Xing Niu Rohit Paturi S. Srinivasan Prashant Mathur Brian Thompson Marcello Federico BDL 25 2 0 01 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision) Kevin Qinghong Lin Faisal Ahmed Linjie Li Chung-Ching Lin E. Azarnasab ... Lin Liang Zicheng Liu Yumao Lu Ce Liu Lijuan Wang MLLM 26 63 0 30 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos? Dayoon Ko Sangho Lee Gunhee Kim 27 6 0 22 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models Changli Tang Wenyi Yu Guangzhi Sun Xianzhao Chen Tian Tan Wei Li Lu Lu Zejun Ma Chao Zhang LM&MA AuLLM 35 199 0 20 Oct 2023
ViPE: Visualise Pretty-much Everything Hassan Shahmohammadi Adhiraj Ghosh Hendrik P. A. Lensch DiffM 23 1 0 16 Oct 2023
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference Dejan Porjazovski Yaroslav Getman Tamás Grósz M. Kurimo 28 3 0 16 Oct 2023
Farzi Data: Autoregressive Data Distillation Noveen Sachdeva Zexue He Wang-Cheng Kang Jianmo Ni D. Cheng Julian McAuley DD 19 3 0 15 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi 28 12 0 15 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text Chanho Park Chengsong Lu Mingjie Chen Thomas Hain 28 3 0 12 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description Tengda Han Max Bain Arsha Nagrani Gül Varol Weidi Xie Andrew Zisserman VGen DiffM 21 36 0 10 Oct 2023
S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models Tiezhi Wang Nils Strodthoff 47 5 0 10 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models Guangzhi Sun Wenyi Yu Changli Tang Xianzhao Chen Tian Tan Wei Li Lu Lu Zejun Ma Chao Zhang 28 12 0 09 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives Yuchen Yang VGen 19 7 0 09 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Anna Langedijk Hosein Mohebbi Gabriele Sarti Willem H. Zuidema Jaap Jumelet 21 10 0 05 Oct 2023
uTalk: Bridging the Gap Between Humans and AI Hussam Azzuni Sharim Jamal Abdulmotaleb Elsaddik 14 6 0 04 Oct 2023
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization Thilo von Neumann Christoph Boeddeker Tobias Cord-Landwehr Marc Delcroix Reinhold Haeb-Umbach 21 7 0 28 Sep 2023
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization A. Hussein Brian Yan Antonios Anastasopoulos Shinji Watanabe Sanjeev Khudanpur 29 3 0 27 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition Jennifer Drexler Fox Desh Raj Natalie Delworth Quinn Mcnamara Corey Miller Miguel Jetté AuLLM 15 7 0 26 Sep 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss R. S. Srinivasa Jaejin Cho Chouchang Yang Yashas Malur Saidutta Ching Hua Lee Yilin Shen Hongxia Jin VLM 29 8 0 26 Sep 2023
Joint Audio and Speech Understanding Yuan Gong Alexander H. Liu Hongyin Luo Leonid Karlinsky James R. Glass AuLLM 28 66 0 25 Sep 2023
Massive End-to-end Models for Short Search Queries Weiran Wang Rohit Prabhavalkar Dongseong Hwang Qiujia Li K. Sim ... Zhong Meng CJ Zheng Yanzhang He Tara N. Sainath P. M. Mengibar 22 2 0 22 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition Krishna C. Puvvada Nithin Rao Koluguri Kunal Dhawan Jagadeesh Balam Boris Ginsburg 24 12 0 19 Sep 2023
Using fine-tuning and min lookahead beam search to improve Whisper Andrea Do Oscar Brown Zhengjie Wang Nikhil Mathew Zixin Liu Jawwad Ahmed Cheng Yu 27 1 0 19 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata Ryandhimas E. Zezario Fei Chen C. Fuh H. Wang Yu Tsao 37 1 0 18 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching Shivam Mehta Ruibo Tu Jonas Beskow Éva Székely G. Henter 16 69 0 06 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic Pretraining Chunyu Qiang Hao Li Yixin Tian Ruibo Fu Tao Wang Longbiao Wang J. Dang 21 5 0 01 Sep 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition Samuel Horváth Stefanos Laskaridis Shashank Rajput Hongyi Wang BDL 32 4 0 28 Aug 2023
Mobile Foundation Model as Firmware Jinliang Yuan Chenchen Yang Dongqi Cai Shihe Wang Xin Yuan ... Di Zhang Hanzi Mei Xianqing Jia Shangguang Wang Mengwei Xu 37 19 0 28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning Linkai Peng Baorian Nuchged Yingming Gao ELM 57 4 0 28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification Harunori Kawano Sota Shimizu 30 1 0 22 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Chunyu Qiang Hao Li Hao Ni He Qu Ruibo Fu Tao Wang Longbiao Wang J. Dang DiffM 30 8 0 28 Jul 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection Nicolae-Cătălin Ristea Radu Tudor Ionescu 20 3 0 27 Jul 2023
Turning Whisper into Real-Time Transcription System Dominik Machávcek Raj Dabre Ondrej Bojar 16 22 0 27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan V. Srivastav Tong Yu Joël L. Lavanchy Pietro Mascagni Pietro Mascagni N. Padoy Nicolas Padoy 24 20 0 27 Jul 2023
Adaptation of Whisper models to child speech recognition Rishabh Jain Andrei Barcovschi Mariam Yiwere Peter Corcoran H. Cucu 11 30 0 24 Jul 2023
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems Thilo von Neumann Christoph Boeddeker Marc Delcroix Reinhold Haeb-Umbach 11 15 0 21 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition Theresa Pekarek-Rosin S. Wermter VLM CLL 19 2 0 14 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Titouan Parcollet Rogier van Dalen Shucong Zhang S. Bhattacharya 16 6 0 12 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Siyang Wang G. Henter Joakim Gustafson Éva Székely 42 5 0 11 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey S. Mohamadi G. Mujtaba Ngan Le Gianfranco Doretto Don Adjeroh LM&MA AI4MH 23 21 0 09 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text Kate Sanders David Etter Reno Kriz Benjamin Van Durme VGen 34 7 0 06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework Eliya Segev Maya Alroy Ronen Katsir Noam Wies Ayana Shenhav ... D. Zar Oren Tadmor Jacob Bitterman Amnon Shashua Tal Rosenwein 25 2 0 04 Jul 2023
Boosting Norwegian Automatic Speech Recognition Javier de la Rosa Rolv-Arild Braaten P. Kummervold Freddy Wetjen Svein Arne Brygfjeld 20 7 0 04 Jul 2023
Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos Ashwin Rao 11 6 0 04 Jul 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology Wisdom O. Ikezogwo M. S. Seyfioglu Fatemeh Ghezloo Dylan Stefan Chan Geva Fatwir Sheikh Mohammed Pavan Kumar Anand Ranjay Krishna Linda G. Shapiro CLIP VLM 136 112 0 20 Jun 2023
Towards End-to-end Speech-to-text Summarization Raul Monteiro Diogo Pernes 9 1 0 06 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses Lucía Gómez Zaragozá Simone Wills Cristian Tejedor-García Javier Marín-Morales Mariano Alcañiz H. Strik 22 8 0 06 Jun 2023
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset Lucas Maison Yannick Esteve 26 3 0 01 Jun 2023
Voice Conversion With Just Nearest Neighbors Matthew Baas Benjamin van Niekerk Herman Kamper SSL 32 48 0 30 May 2023