Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04356
Cited By
Robust Speech Recognition via Large-Scale Weak Supervision
6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Speech Recognition via Large-Scale Weak Supervision"
50 / 454 papers shown
Title
SAGE: Smart home Agent with Grounded Execution
D. Rivkin
F. Hogan
Amal Feriani
Abhisek Konar
Adam Sigal
Steve Liu
Gregory Dudek
LM&Ro
LLMAG
ELM
LRM
18
3
0
01 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
25
2
0
01 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
26
63
0
30 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko
Sangho Lee
Gunhee Kim
27
6
0
22 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
35
199
0
20 Oct 2023
ViPE: Visualise Pretty-much Everything
Hassan Shahmohammadi
Adhiraj Ghosh
Hendrik P. A. Lensch
DiffM
23
1
0
16 Oct 2023
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference
Dejan Porjazovski
Yaroslav Getman
Tamás Grósz
M. Kurimo
28
3
0
16 Oct 2023
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
19
3
0
15 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
A. Alishahi
28
12
0
15 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
28
3
0
12 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
21
36
0
10 Oct 2023
S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models
Tiezhi Wang
Nils Strodthoff
47
5
0
10 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
28
12
0
09 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
19
7
0
09 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
21
10
0
05 Oct 2023
uTalk: Bridging the Gap Between Humans and AI
Hussam Azzuni
Sharim Jamal
Abdulmotaleb Elsaddik
14
6
0
04 Oct 2023
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Thilo von Neumann
Christoph Boeddeker
Tobias Cord-Landwehr
Marc Delcroix
Reinhold Haeb-Umbach
21
7
0
28 Sep 2023
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
A. Hussein
Brian Yan
Antonios Anastasopoulos
Shinji Watanabe
Sanjeev Khudanpur
29
3
0
27 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
15
7
0
26 Sep 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
29
8
0
26 Sep 2023
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
28
66
0
25 Sep 2023
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
22
2
0
22 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
24
12
0
19 Sep 2023
Using fine-tuning and min lookahead beam search to improve Whisper
Andrea Do
Oscar Brown
Zhengjie Wang
Nikhil Mathew
Zixin Liu
Jawwad Ahmed
Cheng Yu
27
1
0
19 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Ryandhimas E. Zezario
Fei Chen
C. Fuh
H. Wang
Yu Tsao
37
1
0
18 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
16
69
0
06 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
21
5
0
01 Sep 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Samuel Horváth
Stefanos Laskaridis
Shashank Rajput
Hongyi Wang
BDL
32
4
0
28 Aug 2023
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
37
19
0
28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
57
4
0
28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
20
3
0
27 Jul 2023
Turning Whisper into Real-Time Transcription System
Dominik Machávcek
Raj Dabre
Ondrej Bojar
16
22
0
27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
24
20
0
27 Jul 2023
Adaptation of Whisper models to child speech recognition
Rishabh Jain
Andrei Barcovschi
Mariam Yiwere
Peter Corcoran
H. Cucu
11
30
0
24 Jul 2023
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems
Thilo von Neumann
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
11
15
0
21 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek-Rosin
S. Wermter
VLM
CLL
19
2
0
14 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
16
6
0
12 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
42
5
0
11 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
G. Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MA
AI4MH
23
21
0
09 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
34
7
0
06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
25
2
0
04 Jul 2023
Boosting Norwegian Automatic Speech Recognition
Javier de la Rosa
Rolv-Arild Braaten
P. Kummervold
Freddy Wetjen
Svein Arne Brygfjeld
20
7
0
04 Jul 2023
Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos
Ashwin Rao
11
6
0
04 Jul 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
136
112
0
20 Jun 2023
Towards End-to-end Speech-to-text Summarization
Raul Monteiro
Diogo Pernes
9
1
0
06 Jun 2023
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
Lucía Gómez Zaragozá
Simone Wills
Cristian Tejedor-García
Javier Marín-Morales
Mariano Alcañiz
H. Strik
22
8
0
06 Jun 2023
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
26
3
0
01 Jun 2023
Voice Conversion With Just Nearest Neighbors
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
32
48
0
30 May 2023
Previous
1
2
3
...
10
7
8
9
Next