ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 454 papers shown
Title
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
19
6
0
28 May 2023
DistriBlock: Identifying adversarial audio samples by leveraging
  characteristics of the output distribution
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Matías P. Pizarro
D. Kolossa
Asja Fischer
AAML
30
1
0
26 May 2023
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm
  Discovery
NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery
Farhad Moghimifar
Shilin Qu
Tongtong Wu
Yuan-Fang Li
Gholamreza Haffari
29
4
0
26 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic
  Modeling of life histories of the Museum of the Person
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
13
7
0
23 May 2023
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample
  Reweighting with Sample Affinity Test
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test
Eungbeom Kim
Yunkee Chae
Jaeheon Sim
Kyogu Lee
17
1
0
22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language,
  and Speech Data
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Scaling laws for language encoding models in fMRI
Scaling laws for language encoding models in fMRI
Richard Antonello
Aditya R. Vaidya
Alexander G. Huth
MedIm
22
55
0
19 May 2023
Solving NLP Problems through Human-System Collaboration: A
  Discussion-based Approach
Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach
Masahiro Kaneko
Graham Neubig
Naoaki Okazaki
33
6
0
19 May 2023
MD3: The Multi-Dialect Dataset of Dialogues
MD3: The Multi-Dialect Dataset of Dialogues
Jacob Eisenstein
Vinodkumar Prabhakaran
Clara E. Rivera
Dorottya Demszky
D. Sharma
16
7
0
19 May 2023
Data Redaction from Conditional Generative Models
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
  with Large Language Model
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
22
92
0
18 May 2023
An Android Robot Head as Embodied Conversational Agent
An Android Robot Head as Embodied Conversational Agent
Marcel Heisler
C. Becker-Asano
LM&Ro
LLMAG
29
0
0
18 May 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language
  Understanding Aided by Speech Translation
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
Mutian He
Philip N. Garner
36
4
0
16 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
30
13
0
15 May 2023
Emolysis: A Multimodal Open-Source Group Emotion Analysis and
  Visualization Toolkit
Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
Shreya Ghosh
Zhixi Cai
Parul Gupta
Garima Sharma
Abhinav Dhall
Munawar Hayat
Tom Gedeon
16
2
0
09 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
37
80
0
08 May 2023
HeySQuAD: A Spoken Question Answering Dataset
HeySQuAD: A Spoken Question Answering Dataset
Yijing Wu
Sai Krishna Rallabandi
R. Srinivasamurthy
Parag Dakle
Alolika Gon
Preethi Raghavan
24
4
0
26 Apr 2023
Spaiche: Extending State-of-the-Art ASR Models to Swiss German Dialects
Spaiche: Extending State-of-the-Art ASR Models to Swiss German Dialects
Clément Sicard
Kajetan Pyszkowski
Victor Gillioz
19
7
0
20 Apr 2023
Prak: An automatic phonetic alignment tool for Czech
Prak: An automatic phonetic alignment tool for Czech
V. Hanzl
Adléta Hanzlová
22
0
0
17 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
28
6
0
13 Apr 2023
AGI for Agriculture
AGI for Agriculture
Guoyu Lu
Sheng R. Li
Gengchen Mai
Jin Sun
Dajiang Zhu
...
R. Xu
Daniel Petti
Changying Li
Tianming Liu
Changying Li
AI4CE
45
17
0
12 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Mohit Bansal
3DV
18
51
0
29 Mar 2023
Hallucinations in Large Multilingual Translation Models
Hallucinations in Large Multilingual Translation Models
Nuno M. Guerreiro
Duarte M. Alves
Jonas Waldendorf
Barry Haddow
Alexandra Birch
Pierre Colombo
André F.T. Martins
VLM
HILM
LRM
20
140
0
28 Mar 2023
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to
  Imperfect Modalities
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities
Konstantinos Kontras
Christos Chatzichristos
Huy P Phan
Johan A. K. Suykens
Marina De Vos
AI4TS
24
11
0
27 Mar 2023
AfroDigits: A Community-Driven Spoken Digit Dataset for African
  Languages
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Chris C. Emezue
Sanchit Gandhi
Lewis Tunstall
Abubakar Abid
Josh Meyer
...
Douwe Kiela
Yacine Jernite
Julien Chaumond
Merve Noyan
Omar Sanseviero
25
2
0
22 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
ICASSP 2023 Deep Noise Suppression Challenge
ICASSP 2023 Deep Noise Suppression Challenge
Harishchandra Dubey
A. Aazami
Vishak Gopal
Sergiy Matusevych
Sebastian Braun
...
Sefik Emre Eskimez
Manthan Thakker
H. Gamper
Takuya Yoshioka
R. Aichner
26
82
0
21 Mar 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
29
6
0
01 Mar 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
31
182
0
20 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
72
0
0
18 Feb 2023
Cross-Corpora Spoken Language Identification with Domain Diversification
  and Generalization
Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization
Spandan Dey
Md. Sahidullah
G. Saha
13
11
0
10 Feb 2023
PSST! Prosodic Speech Segmentation with Transformers
PSST! Prosodic Speech Segmentation with Transformers
Nathan Roll
C. Graham
Simon Todd
VLM
26
5
0
03 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
Efficient Domain Adaptation for Speech Foundation Models
Bo-wen Li
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
28
23
0
03 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Affective Faces for Goal-Driven Dyadic Communication
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
26
18
0
26 Jan 2023
Hopf Physical Reservoir Computer for Reconfigurable Sound Recognition
Hopf Physical Reservoir Computer for Reconfigurable Sound Recognition
M. R. E. U. Shougat
Xiaofu Li
Siyao Shao
K. McGarvey
E. Perkins
14
11
0
20 Dec 2022
Speaking Style Conversion in the Waveform Domain Using Discrete
  Self-Supervised Units
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
Gallil Maimon
Yossi Adi
21
13
0
19 Dec 2022
The Decades Progress on Code-Switching Research in NLP: A Systematic
  Survey on Trends and Challenges
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Genta Indra Winata
Alham Fikri Aji
Zheng-Xin Yong
Thamar Solorio
37
33
0
19 Dec 2022
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data
  Format
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
Qi Zhu
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Baolin Peng
...
Dazhen Wan
Xiaochen Zhu
Jianfeng Gao
Milica Gavsić
Minlie Huang
46
23
0
30 Nov 2022
Better Transcription of UK Supreme Court Hearings
Better Transcription of UK Supreme Court Hearings
Hadeel Saadany
C. Breslin
Constantin Orasan
Sophie Walker
AILaw
11
6
0
29 Nov 2022
BARTSmiles: Generative Masked Language Models for Molecular
  Representations
BARTSmiles: Generative Masked Language Models for Molecular Representations
Gayane Chilingaryan
Hovhannes Tamoyan
Ani Tevosyan
N. Babayan
L. Khondkaryan
Karen Hambardzumyan
Zaven Navoyan
Hrant Khachatrian
Armen Aghajanyan
SSL
27
25
0
29 Nov 2022
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
Jianwei Zhang
J. Liss
Suren Jayasuriya
Visar Berisha
28
6
0
17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
26
12
0
13 Nov 2022
On minimal variations for unsupervised representation learning
On minimal variations for unsupervised representation learning
Vivien A. Cabannes
A. Bietti
Randall Balestriero
SSL
DRL
25
8
0
07 Nov 2022
A Weakly-Supervised Streaming Multilingual Speech Model with Truly
  Zero-Shot Capability
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Jian Xue
Peidong Wang
Jinyu Li
Eric Sun
19
10
0
04 Nov 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
19
74
0
26 Oct 2022
A Textless Metric for Speech-to-Speech Comparison
A Textless Metric for Speech-to-Speech Comparison
Laurent Besacier
S. Ribeiro
Olivier Galibert
Ioan Calapodescu
33
5
0
21 Oct 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
16
13
0
07 Jun 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of
  Speech
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
78
282
0
25 May 2022
Previous
123...1089
Next