ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 459 papers shown
Title
Symmetric Dot-Product Attention for Efficient Training of BERT Language
  Models
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Martin Courtois
Malte Ostendorff
Leonhard Hennig
Georg Rehm
31
2
0
10 Jun 2024
Learning Fine-Grained Controllability on Speech Generation via Efficient
  Fine-Tuning
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
32
0
0
10 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
36
2
0
09 Jun 2024
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
Huma Ameer
Seemab Latif
Iram Tariq Bhatti
38
1
0
09 Jun 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
23
1
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
29
9
0
07 Jun 2024
LLM-based speaker diarization correction: A generalizable approach
LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis
Vijay Yadav
Anzar Abbas
43
3
0
07 Jun 2024
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech
  Activity Detection and Speaker Gender Segmentation
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation
D. Doukhan
Christine Maertens
William Le Personnic
Ludovic Speroni
Reda Dehak
30
2
0
06 Jun 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in
  Speech Enhancement
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Wangyou Zhang
Kohei Saijo
Jee-weon Jung
Chenda Li
Shinji Watanabe
Yanmin Qian
30
4
0
06 Jun 2024
Hypernetworks for Personalizing ASR to Atypical Speech
Hypernetworks for Personalizing ASR to Atypical Speech
Max Müller-Eberstein
Dianna Yee
Karren D. Yang
G. Mantena
Colin S. Lea
33
0
0
06 Jun 2024
BIPED: Pedagogically Informed Tutoring System for ESL Education
BIPED: Pedagogically Informed Tutoring System for ESL Education
Soonwoo Kwon
Sojung Kim
Minju Park
Seunghyun Lee
Kyuseok Kim
29
3
0
05 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
27
1
0
04 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
44
2
0
04 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
51
0
0
03 Jun 2024
Towards a copilot in BIM authoring tool using a large language
  model-based agent for intelligent human-machine interaction
Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction
Changyu Du
Stavros Nousias
André Borrmann
LLMAG
26
2
0
02 Jun 2024
1st Place Solution to Odyssey Emotion Recognition Challenge Task1:
  Tackling Class Imbalance Problem
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen
Hezhao Zhang
Yuanchao Li
Jiachen Luo
Wen Wu
...
Lin Wang
P. Woodland
Xie Chen
Huy P Phan
Thomas Hain
23
0
0
30 May 2024
Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data
Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data
Eloi Moliner
Sebastian Braun
H. Gamper
OT
44
2
0
29 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
110
5
0
27 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
34
6
0
26 May 2024
Denoising LM: Pushing the Limits of Error Correction Models for Speech
  Recognition
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Richard He Bai
Erik McDermott
R. Collobert
Navdeep Jaitly
AuLLM
43
2
0
24 May 2024
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation
A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation
Iveta Becková
Stefan Pócos
Giulia Belgiovine
Marco Matarese
A. Sciutti
Carlo Mazzola
Carlo Mazzola
42
0
0
20 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
39
28
0
18 May 2024
SIGMA: An Open-Source Interactive System for Mixed-Reality Task
  Assistance Research
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
D. Bohus
Sean Andrist
Nick Saw
Ann Paradiso
Ishani Chakraborty
Mahdi Rad
38
9
0
16 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
E. Chng
Ruizhe Li
AuLLM
KELM
41
5
0
16 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
38
39
0
14 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
37
0
14 May 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
24
1
0
11 May 2024
An Investigation of Incorporating Mamba for Speech Enhancement
An Investigation of Incorporating Mamba for Speech Enhancement
Rong-Yu Chao
Wen-Huang Cheng
Moreno La Quatra
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Szu-Wei Fu
Yu Tsao
Mamba
47
25
0
10 May 2024
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal
  Emotion Linking as Graph-Based Parsing
LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
Ana Ezquerro
David Vilares
34
1
0
10 May 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases
  of Automatic Speech Recognition Systems Against Disfluent Speech
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Dena F. Mujtaba
N. Mahapatra
Megan Arney
J Scott Yaruss
Hope Gerlach-Houck
Caryn Herring
Jia Bin
32
0
0
10 May 2024
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech
  Foundation Models
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
Vyas Raina
Rao Ma
Charles G McGhee
Kate Knill
Mark J. F. Gales
AAML
29
4
0
09 May 2024
Mixat: A Data Set of Bilingual Emirati-English Speech
Mixat: A Data Set of Bilingual Emirati-English Speech
Maryam Al Ali
Hanan Aldarmaki
34
0
0
04 May 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
41
4
0
30 Apr 2024
Automatic Speech Recognition System-Independent Word Error Rate
  Estimation
Automatic Speech Recognition System-Independent Word Error Rate Estimation
Chanho Park
Mingjie Chen
Thomas Hain
21
0
0
25 Apr 2024
Crossing the principle-practice gap in AI ethics with ethical
  problem-solving
Crossing the principle-practice gap in AI ethics with ethical problem-solving
N. Corrêa
James William Santos
Camila Galvão
Marcelo Pasetti
Dieine Schiavon
Faizah Naqvi
Robayet Hossain
N. D. Oliveira
34
3
0
16 Apr 2024
Anatomy of Industrial Scale Multilingual ASR
Anatomy of Industrial Scale Multilingual ASR
Francis McCann Ramirez
Luka Chkhetiani
Andrew Ehrenberg
R. McHardy
Rami Botros
...
Ahmed Efty
Daniel McCrystal
Sam Flamini
Domenic Donato
Takuya Yoshioka
29
7
0
15 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
35
7
0
13 Apr 2024
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Hassan Ali
Philipp Allgeuer
Stefan Wermter
44
1
0
12 Apr 2024
Behavior Trees Enable Structured Programming of Language Model Agents
Behavior Trees Enable Structured Programming of Language Model Agents
Richard Kelley
AI4CE
LM&Ro
LLMAG
37
0
0
11 Apr 2024
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
Tien-Hong Lo
Fu-An Chao
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
23
3
0
11 Apr 2024
Linguistic Changes in Spontaneous Speech for Detecting Parkinsons
  Disease Using Large Language Models
Linguistic Changes in Spontaneous Speech for Detecting Parkinsons Disease Using Large Language Models
Jonathan Crawford
36
0
0
08 Apr 2024
Exploration is Harder than Prediction: Cryptographically Separating
  Reinforcement Learning from Supervised Learning
Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning
Noah Golowich
Ankur Moitra
Dhruv Rohatgi
OffRL
32
4
0
04 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
36
21
0
03 Apr 2024
Chat Modeling: Natural Language-based Procedural Modeling of Biological
  Structures without Training
Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training
Donggang Jia
Yunhai Wang
Ivan Viola
29
1
0
01 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
43
0
31 Mar 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
40
2
0
28 Mar 2024
PhoWhisper: Automatic Speech Recognition for Vietnamese
PhoWhisper: Automatic Speech Recognition for Vietnamese
Thanh-Thien Le
L. T. Nguyen
Dat Quoc Nguyen
24
3
0
27 Mar 2024
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
Ethan N. Evans
Matthew G. Cook
Zachary P. Bradshaw
Margarite L. LaBorde
40
5
0
21 Mar 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
Zunnan Xu
Yukang Lin
Haonan Han
Sicheng Yang
Ronghui Li
Yachao Zhang
Xiu Li
Mamba
46
25
0
14 Mar 2024
Previous
123...1056789
Next