ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 459 papers shown
Title
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
34
2
0
14 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
40
8
0
14 Mar 2024
A New Benchmark for Evaluating Automatic Speech Recognition in the
  Arabic Call Domain
A New Benchmark for Evaluating Automatic Speech Recognition in the Arabic Call Domain
Qusai Abo Obaidah
Muhy Eddin Za'ter
Adnan Jaljuli
Ali Mahboub
Asma Hakouz
Bashar Alfrou
Yazan Estaitia
21
1
0
07 Mar 2024
SaulLM-7B: A pioneering Large Language Model for Law
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo
T. Pires
Malik Boudiaf
Dominic Culver
Rui Melo
...
Andre F. T. Martins
Fabrizio Esposito
Vera Lúcia Raposo
Sofia Morgado
Michael Desa
ELM
AILaw
39
63
0
06 Mar 2024
Neural Additive Image Model: Interpretation through Interpolation
Neural Additive Image Model: Interpretation through Interpolation
Arik Reuter
Anton Thielmann
Benjamin Saefken
DiffM
31
1
0
06 Mar 2024
Adversarial Infrared Geometry: Using Geometry to Perform Adversarial
  Attack against Infrared Pedestrian Detectors
Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors
Kalibinuer Tiliwalidi
AAML
46
0
0
06 Mar 2024
RADIA -- Radio Advertisement Detection with Intelligent Analytics
RADIA -- Radio Advertisement Detection with Intelligent Analytics
Jorge Álvarez
J. C. Armenteros
Camilo Torrón
Miguel Ortega-Martín
Alfonso Ardoiz
...
Íñigo Galdeano
Ignacio Garrido
Adrián Alonso
Fernando Bayón
Oleg Vorontsov
26
0
0
06 Mar 2024
Non-verbal information in spontaneous speech -- towards a new framework
  of analysis
Non-verbal information in spontaneous speech -- towards a new framework of analysis
Tirza Biron
Moshe Barboy
Eran Ben-Artzy
Alona Golubchik
Yanir Marmor
Smadar Szekely
Yaron Winter
David Harel
29
0
0
06 Mar 2024
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Yue Li
Koen V. Hindriks
Florian A. Kunneman
27
2
0
05 Mar 2024
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
Janaína Mendes-Laureano
Jorge A. Gómez-García
Alejandro Guerrero-López
Elisa Luque-Buzo
Julián D. Arias-Londoño
Francisco J. Grandas-Pérez
Juan Ignacio Godino-Llorente
11
4
0
04 Mar 2024
CustomListener: Text-guided Responsive Interaction for User-friendly
  Listening Head Generation
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu
Ying Guo
Cheng Zhen
Tong Li
Yingying Ao
Pengfei Yan
DiffM
34
3
0
01 Mar 2024
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale
  Speech Recognition
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
Jeehyun Lee
Yerin Choi
Tae-Jin Song
M. Koo
14
4
0
29 Feb 2024
High-Fidelity Neural Phonetic Posteriorgrams
High-Fidelity Neural Phonetic Posteriorgrams
Cameron Churchwell
Max Morrison
Bryan Pardo
32
4
0
27 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
26
0
0
25 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
22
0
20 Feb 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up
  Speech Diffusion Model
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
Xiangyu Zhang
Daijiao Liu
Hexin Liu
Qiquan Zhang
Hanyu Meng
Leibny Paola García
Chng Eng Siong
Lina Yao
DiffM
15
2
0
16 Feb 2024
GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022
  Attempted Coup in Peru
GET-Tok: A GenAI-Enriched Multimodal TikTok Dataset Documenting the 2022 Attempted Coup in Peru
Gabriela Pinto
Keith Burghardt
Kristina Lerman
Emilio Ferrara
6
3
0
08 Feb 2024
Institutional Platform for Secure Self-Service Large Language Model Exploration
Institutional Platform for Secure Self-Service Large Language Model Exploration
V. Bumgardner
Mitchell A. Klusty
W. V. Logan
Samuel E. Armstrong
Caylin D. Hickey
Jeff Talbert
Caylin Hickey
Jeff Talbert
48
1
0
01 Feb 2024
Comuniqa : Exploring Large Language Models for improving speaking skills
Comuniqa : Exploring Large Language Models for improving speaking skills
Manas Mhasakar
Shikhar Sharma
Apurv Mehra
Utkarsh Venaik
Ujjwal Singhal
Dhruv Kumar
Kashish Mittal
22
4
0
28 Jan 2024
Speech foundation models on intelligibility prediction for
  hearing-impaired listeners
Speech foundation models on intelligibility prediction for hearing-impaired listeners
Santiago Cuervo
R. Marxer
30
6
0
24 Jan 2024
Large Language Models are Efficient Learners of Noise-Robust Speech
  Recognition
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Chao Zhang
Pin-Yu Chen
Ensiong Chng
25
20
0
19 Jan 2024
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Yichao Du
Zhirui Zhang
Linan Yue
Xu Huang
Yuqing Zhang
Tong Bill Xu
Linli Xu
Enhong Chen
FedML
54
5
0
18 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
60
35
0
16 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Licai Sun
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
15
12
0
07 Jan 2024
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic
  Speech Recognition
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
David M. Chan
Shalini Ghosh
Hitesh Tulsiani
Ariya Rastrow
Björn Hoffmeister
28
1
0
04 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
25
85
0
23 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
26
5
0
21 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
27
1
0
18 Dec 2023
GSQA: An End-to-End Model for Generative Spoken Question Answering
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih
Ho-Lam Chung
Yu-Chi Pai
Ming-Hao Hsu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
ELM
AuLLM
28
2
0
15 Dec 2023
Attention-Guided Adaptation for Code-Switching Speech Recognition
Attention-Guided Adaptation for Code-Switching Speech Recognition
Bobbi Aditya
Mahdin Rohmatillah
Liang-Hsuan Tai
Jen-Tzung Chien
21
8
0
14 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
26
9
0
13 Dec 2023
Toward a Reinforcement-Learning-Based System for Adjusting Medication to
  Minimize Speech Disfluency
Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
Pavlos Constas
Vikram Rawal
Matthew Honorio Oliveira
Andreas Constas
Aditya Khan
...
Heraa Murqi
Asad Khan
Nimit Amikumar Bhanshali
Youssef Rachad
Michael Guerzhoy
OffRL
13
0
0
12 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
41
174
0
11 Dec 2023
Multimodal Data and Resource Efficient Device-Directed Speech Detection
  with Large Foundation Models
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
15
3
0
06 Dec 2023
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
  Adaptation
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation
Linzi Xing
Quan Tran
Fabian Caba
Franck Dernoncourt
Seunghyun Yoon
Zhaowen Wang
Trung Bui
Giuseppe Carenini
41
1
0
30 Nov 2023
Decentralized Deepfake Detection Blockchain Network using Dynamic
  Algorithm management
Decentralized Deepfake Detection Blockchain Network using Dynamic Algorithm management
Dipankar Sarkar
21
1
0
30 Nov 2023
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for
  Distortion-Invariant Robust Speech Recognition
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition
Dongning Yang
Wei Wang
Yanmin Qian
13
3
0
29 Nov 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
26
11
0
29 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
21
28
0
29 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
The Claire French Dialogue Dataset
The Claire French Dialogue Dataset
Julie Hunter
Jérôme Louradour
Virgile Rennard
Ismail Harrando
Guokan Shang
Jean-Pierre Lorré
23
1
0
28 Nov 2023
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
30
4
0
23 Nov 2023
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with
  Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
J. Zhong
Ming Li
Yinliang Chen
Zihang Wei
Fan Yang
Haoran Shen
27
14
0
21 Nov 2023
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker
  Verification Loss for Noise Robustness
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov
Valeria Pronina
Alexander Kuzmin
Maksim Borisov
Nikita Usoltsev
Xingshan Zeng
Alexander Golubkov
Nikolai Ermolenko
Aleksandra Shirshova
Yulia Matveeva
21
2
0
16 Nov 2023
Fumbling in Babel: An Investigation into ChatGPT's Language
  Identification Ability
Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability
Wei-Rui Chen
Ife Adebara
Khai Duy Doan
Qisheng Liao
Muhammad Abdul-Mageed
17
5
0
16 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
33
3
0
15 Nov 2023
Fast Certification of Vision-Language Models Using Incremental
  Randomized Smoothing
Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
Ashutosh Nirala
Ameya Joshi
Chinmay Hegde
S Sarkar
VLM
33
0
0
15 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
28
267
0
14 Nov 2023
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition
Xiaohan Shi
Jiajun He
Xingfeng Li
T. Toda
26
3
0
13 Nov 2023
Towards End-to-End Spoken Grammatical Error Correction
Towards End-to-End Spoken Grammatical Error Correction
Stefano Bannò
Rao Ma
Mengjie Qian
Kate Knill
Mark J. F. Gales
19
2
0
09 Nov 2023
Previous
123...106789
Next