ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 459 papers shown
Title
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
34
3
0
07 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Y. Wang
Xie Chen
AuLLM
34
23
0
05 Aug 2024
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of
  Audio-Guided LLM-Based Robot Navigation
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation
Xingpeng Sun
Yiran Zhang
Xindi Tang
Amrit Singh Bedi
Aniket Bera
44
4
0
03 Aug 2024
Neural Network Emulator for Atmospheric Chemical ODE
Neural Network Emulator for Atmospheric Chemical ODE
Zhi-Song Liu
Petri S. Clusius
Michael Boy
38
3
0
03 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
43
5
0
31 Jul 2024
Accelerating Large Language Model Inference with Self-Supervised Early
  Exits
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
36
1
0
30 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
31
9
0
23 Jul 2024
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue
  Language Modeling
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
Wataru Nakata
Kentaro Seki
Hitomi Yanaka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
AuLLM
43
0
0
22 Jul 2024
Empirical Capacity Model for Self-Attention Neural Networks
Empirical Capacity Model for Self-Attention Neural Networks
Aki Härmä
M. Pietrasik
Anna Wilbik
34
1
0
22 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
32
4
0
21 Jul 2024
Audio-visual training for improved grounding in video-text LLMs
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
27
0
0
21 Jul 2024
Morphosyntactic Analysis for CHILDES
Morphosyntactic Analysis for CHILDES
Houjun Liu
Brian MacWhinney
20
1
0
17 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
75
1
0
16 Jul 2024
Walk along: An Experiment on Controlling the Mobile Robot 'Spot' with Voice and Gestures
Walk along: An Experiment on Controlling the Mobile Robot 'Spot' with Voice and Gestures
Renchi Zhang
Jesse van der Linden
Dimitra Dodou
H. Seyffert
Y. B. Eisma
J. D. Winter
32
0
0
15 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
86
2
0
09 Jul 2024
Depression Detection and Analysis using Large Language Models on Textual
  and Audio-Visual Modalities
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities
Avinash Anand
Chayan Tank
Sarthak Pol
Vinayak Katoch
Shaina Mehta
R. Shah
32
4
0
08 Jul 2024
MINDECHO: Role-Playing Language Agents for Key Opinion Leaders
MINDECHO: Role-Playing Language Agents for Key Opinion Leaders
Rui Xu
Dakuan Lu
Xiaoyu Tan
Xintao Wang
Siyu Yuan
Jiangjie Chen
Wei Chu
Xu Yinghui
LLMAG
34
3
0
07 Jul 2024
Prosody-Driven Privacy-Preserving Dementia Detection
Prosody-Driven Privacy-Preserving Dementia Detection
Dominika Woszczyk
Ranya Aloufi
Soteris Demetriou
34
2
0
03 Jul 2024
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Jianzhu Guo
Dingyun Zhang
Xiaoqiang Liu
Zhizhou Zhong
Yuan Zhang
Pengfei Wan
Di Zhang
VGen
61
53
0
03 Jul 2024
Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors
  Using Adversarial Infrared Grid
Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid
Kalibinuer Tiliwalidi
Chengyin Hu
Weiwen Shi
AAML
20
1
0
01 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma
Yassir Fathullah
Mengjie Qian
Siyuan Tang
Mark J. F. Gales
Kate Knill
20
1
0
01 Jul 2024
Clustering in pure-attention hardmax transformers and its role in
  sentiment analysis
Clustering in pure-attention hardmax transformers and its role in sentiment analysis
Albert Alcalde
Giovanni Fantuzzi
Enrique Zuazua
27
3
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
FASA: a Flexible and Automatic Speech Aligner for Extracting
  High-quality Aligned Children Speech Data
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data
Dancheng Liu
Jinjun Xiong
21
0
0
25 Jun 2024
Generative AI Systems: A Systems-based Perspective on Generative AI
Generative AI Systems: A Systems-based Perspective on Generative AI
Jakub M. Tomczak
48
1
0
25 Jun 2024
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual
  Text-to-Speech Adaptation
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation
Yingting Li
Ambuj Mehrish
Bryan Chew
Bo Cheng
Soujanya Poria
38
0
0
25 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
38
0
0
24 Jun 2024
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding
  with Task Divide-and-Conquer
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
Lu Zhang
Tiancheng Zhao
Heting Ying
Yibo Ma
Kyusong Lee
LLMAG
36
9
0
24 Jun 2024
Perception of Phonological Assimilation by Neural Speech Recognition
  Models
Perception of Phonological Assimilation by Neural Speech Recognition Models
Charlotte Pouw
Marianne de Heer Kloots
A. Alishahi
Willem H. Zuidema
42
2
0
21 Jun 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
39
3
0
21 Jun 2024
On Newton's Method to Unlearn Neural Networks
On Newton's Method to Unlearn Neural Networks
Nhung Bui
Xinyang Lu
Rachael Hwee Ling Sim
See-Kiong Ng
Bryan Kian Hsiang Low
MU
39
2
0
20 Jun 2024
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Yunxin Li
Xinyu Chen
Baotian Hu
Longyue Wang
Haoyuan Shi
Min-Ling Zhang
MLLM
LRM
44
25
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language
  Models via Cycle Block Gradient Descent
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
37
1
0
17 Jun 2024
Large Language Models for Dysfluency Detection in Stuttered Speech
Large Language Models for Dysfluency Detection in Stuttered Speech
Dominik Wagner
Sebastian P. Bayerl
Ilja Baumann
K. Riedhammer
Elmar Nöth
Tobias Bocklet
45
3
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text
  Interleaving
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
P. Jyothi
Pushpak Bhattacharyya
40
1
0
16 Jun 2024
Optimizing Automatic Speech Assessment: W-RankSim Regularization and
  Hybrid Feature Fusion Strategies
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
Chung-Wen Wu
Berlin Chen
40
0
0
16 Jun 2024
Large Language Models for Automatic Milestone Detection in Group
  Discussions
Large Language Models for Automatic Milestone Detection in Group Discussions
Zhuoxu Duan
Zhengye Yang
Samuel Westby
Christoph Riedl
B. F. Welles
Richard J. Radke
28
0
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
34
9
0
15 Jun 2024
Enhancing Multilingual Voice Toxicity Detection with Speech-Text
  Alignment
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
Joseph Liu
Mahesh Kumar Nandwana
Janne Pylkkönen
Hannes Heikinheimo
Morgan McGuire
37
0
0
14 Jun 2024
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised
  Learning with Targeted Fine-Tuning and Data Augmentation
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
Dena F. Mujtaba
N. Mahapatra
Megan Arney
J Scott Yaruss
Caryn Herring
Jia Bin
29
1
0
14 Jun 2024
Detecting the terminality of speech-turn boundary for spoken
  interactions in French TV and Radio content
Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content
Rémi Uro
Marie Tahon
D. Doukhan
Antoine Laurent
Albert Rilliard
33
0
0
14 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
84
9
0
14 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and
  Missing Labels
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
34
13
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
34
0
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
43
15
0
11 Jun 2024
Cognitive Insights Across Languages: Enhancing Multimodal Interview
  Analysis
Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis
David Ortiz-Perez
José García Rodríguez
David Tomás
28
1
0
11 Jun 2024
Multimodal Belief Prediction
Multimodal Belief Prediction
John Murzaku
Adil Soubki
Owen Rambow
16
0
0
11 Jun 2024
An Improved Empirical Fisher Approximation for Natural Gradient Descent
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu
Wenyi Yu
Chao Zhang
Philip Woodland
27
3
0
10 Jun 2024
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of
  Progress in Speech Emotion Recognition
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
Andreas Triantafyllopoulos
A. Batliner
Simon Rampp
M. Milling
Björn Schuller
VLM
23
0
0
10 Jun 2024
Previous
123456...8910
Next