ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.04356
  4. Cited By
Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision

6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
    OffRL
ArXivPDFHTML

Papers citing "Robust Speech Recognition via Large-Scale Weak Supervision"

50 / 454 papers shown
Title
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
58
11
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
72
21
0
26 Sep 2024
Exploring synthetic data for cross-speaker style transfer in style
  representation based TTS
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas Ueda
Leonardo B. de M. M. Marques
Flávio O. Simões
Mário Uliani Neto
Fernando Runstein
Bianca Dal Bó
Paula D. P. Costa
21
0
0
25 Sep 2024
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech
  Recognition
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Andrés Piñeiro-Martín
C. García-Mateo
Laura Docío-Fernández
María del Carmen López-Pérez
Georg Rehm
32
3
0
25 Sep 2024
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing
  Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech,
  OCR, and Visual Features
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features
Katharina Anderer
Andreas Reich
Matthias Wölfel
13
0
0
25 Sep 2024
The Roles of Generative Artificial Intelligence in Internet of Electric
  Vehicles
The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles
Hanwen Zhang
Dusit Niyato
Wei Zhang
Changyuan Zhao
Hongyang Du
Abbas Jamalipour
Sumei Sun
Yiyang Pei
AI4CE
42
2
0
24 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
44
11
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
SongTrans: An unified song transcription and alignment method for lyrics
  and notes
SongTrans: An unified song transcription and alignment method for lyrics and notes
Siwei Wu
Jinzheng He
Ruibin Yuan
Haojie Wei
Xipin Wei
Chenghua Lin
Jin Xu
Junyang Lin
45
1
0
22 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
44
0
0
22 Sep 2024
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
Khai Le-Duc
Phuc Phan
Tan-Hanh Pham
Bach Phan Tat
Minh-Huong Ngo
Chris Ngo
Thanh Nguyen-Tang
Truong Son-Hy
LM&MA
43
0
0
21 Sep 2024
On the Feasibility of Fully AI-automated Vishing Attacks
On the Feasibility of Fully AI-automated Vishing Attacks
João Figueiredo
Afonso Carvalho
Daniel Castro
Daniel Gonçalves
Nuno Santos
27
2
0
20 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
34
2
0
19 Sep 2024
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet Gündüz
Yunsu Kim
Kamer Ali Yuksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
33
0
0
19 Sep 2024
ASR Benchmarking: Need for a More Representative Conversational Dataset
ASR Benchmarking: Need for a More Representative Conversational Dataset
Gaurav Maheshwari
Dmitry Ivanov
Théo Johannet
Kevin El Haddad
18
0
0
18 Sep 2024
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
Bashar Alhafni
Sowmya Vajjala
Stefano Banno
Kaushal Kumar Maurya
Ekaterina Kochmar
AI4Ed
35
1
0
18 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
65
1
0
18 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
31
9
0
18 Sep 2024
WER We Stand: Benchmarking Urdu ASR Models
WER We Stand: Benchmarking Urdu ASR Models
Samee Arif
Aamina Jamal Khan
Mustafa Abbas
Agha Ali Raza
Awais Athar
24
3
0
17 Sep 2024
Chain-of-Thought Prompting for Speech Translation
Chain-of-Thought Prompting for Speech Translation
Ke Hu
Zhehuai Chen
Chao-Han Huck Yang
Piotr Żelasko
Oleksii Hrinchuk
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
LRM
37
2
0
17 Sep 2024
High-Resolution Speech Restoration with Latent Diffusion Model
High-Resolution Speech Restoration with Latent Diffusion Model
Tushar Dhyani
Florian Lux
Michele Mancusi
Giorgio Fabbro
Fritz Hohl
Ngoc Thang Vu
DiffM
35
0
0
17 Sep 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Yi-Jen Shih
Zoi Gkalitsiou
A. Dimakis
David Harwath
39
1
0
16 Sep 2024
Voice control interface for surgical robot assistants
Voice control interface for surgical robot assistants
Ana Davila
Jacinto Colan
Yasuhisa Hasegawa
13
1
0
16 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
Alexander Wuttke
Matthias Aßenmacher
Christopher Klamm
Max M. Lang
Quirin Würschinger
Frauke Kreuter
34
2
0
16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
34
3
0
15 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
34
1
0
14 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
34
1
0
13 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
68
1
0
13 Sep 2024
WhisperNER: Unified Open Named Entity and Speech Recognition
WhisperNER: Unified Open Named Entity and Speech Recognition
Gil Ayache
Menachem Pirchi
Aviv Navon
Aviv Shamsian
Gill Hetz
Joseph Keshet
30
0
0
12 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
49
5
0
11 Sep 2024
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Wen-Chin Huang
Szu-Wei Fu
Erica Cooper
Ryandhimas E. Zezario
T. Toda
Hsin-Min Wang
Junichi Yamagishi
Yu Tsao
32
5
0
11 Sep 2024
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
Jihyun Lee
Solee Im
Wonjun Lee
Gary Geunbae Lee
31
0
0
10 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
W. Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
A. Aw
AuLLM
67
1
0
10 Sep 2024
Referring Expression Generation in Visually Grounded Dialogue with
  Discourse-aware Comprehension Guiding
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
Bram Willemsen
Gabriel Skantze
23
0
0
09 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training
  for Enhanced Speech Recognition and Translation
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
32
2
0
09 Sep 2024
Lightweight Transducer Based on Frame-Level Criterion
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
36
1
0
05 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
42
20
0
05 Sep 2024
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR
  Model Evaluations
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Kavya Manohar
Leena G Pillai
29
3
0
04 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
34
4
0
03 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
29
0
0
30 Aug 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
38
7
0
26 Aug 2024
Sample-Independent Federated Learning Backdoor Attack in Speaker Recognition
Sample-Independent Federated Learning Backdoor Attack in Speaker Recognition
Weida Xu
Yang Xu
Sicong Zhang
FedML
AAML
36
0
0
25 Aug 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
40
1
0
23 Aug 2024
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
Zhengyuan Zhu
Daniel Lee
Hong Zhang
Sai Sree Harsha
Loic Feujio
Akash Maharaj
Yunyao Li
17
2
0
16 Aug 2024
Enhancing Large Language Model-based Speech Recognition by
  Contextualization for Rare and Ambiguous Words
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
43
1
0
15 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
42
0
0
14 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
23
2
0
12 Aug 2024
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for
  Competitive Debate
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate
Yiqun Zhang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
LLMAG
32
4
0
08 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
34
3
0
07 Aug 2024
Previous
12345...8910
Next