Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04356
Cited By
Robust Speech Recognition via Large-Scale Weak Supervision
6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Speech Recognition via Large-Scale Weak Supervision"
50 / 454 papers shown
Title
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fa-Ting Hong
Zunnan Xu
Zixiang Zhou
Jun Zhou
Xiu Li
Qin Lin
Qinglin Lu
D. Xu
DiffM
VGen
57
2
0
03 Apr 2025
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Y. Zhang
Y. Wang
Shouda Liu
MLLM
MoE
67
1
0
31 Mar 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Y. Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
52
0
0
31 Mar 2025
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
Hyunjong Ok
Suho Yoo
Jaeho Lee
34
0
0
30 Mar 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
J. Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Y. Wang
Chao Zhang
AuLLM
76
0
0
26 Mar 2025
Whispering in Amharic: Fine-tuning Whisper for Low-resource Language
Dawit Ketema Gete
Bedru Yimam Ahamed
Tadesse Destaw Belay
Yohannes Ayana Ejigu
Sukairaj Hafiz Imam
...
Umma Aliyu Musa
Martin Semmann
Shamsuddeen Hassan Muhammad
Henning Schreiber
Seid Muhie Yimam
43
0
0
24 Mar 2025
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi
Md Monzurul Islam
Mahmuda Sultana Mimi
Sazzad Bin Bashar Polock
Gaurab Chhetri
Subasish Das
Mamba
AI4TS
45
0
0
22 Mar 2025
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
H. Wang
Kai Hu
Liangcai Gao
144
0
0
20 Mar 2025
PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation
Baiqin Wang
Xiangyu Zhu
Fan Shen
Hao-Xuan Xu
Zhen Lei
55
0
0
18 Mar 2025
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model
Yuzhi Lai
Shenghai Yuan
Youssef Nassar
Mingyu Fan
T. Weber
Matthias Rätsch
LM&Ro
64
3
0
12 Mar 2025
ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Xukun Zhou
Fengxin Li
Ming Chen
Yan Zhou
Pengfei Wan
Di Zhang
Yeying Jin
Zhaoxin Fan
Hongyan Liu
Jun He
DiffM
VGen
46
0
0
09 Mar 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna C. Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
41
0
0
07 Mar 2025
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Nuria Alina Chandra
Ryan Murtfeldt
Lin Qiu
Arnab Karmakar
Hannah Lee
...
Sejin Paik
Changyeon Lee
Jongwook Choi
Aerin Kim
O. Etzioni
59
3
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
23
0
03 Mar 2025
Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems
Ajinkya Kulkarni
Atharva Kulkarni
Miguel Couceiro
Isabel Trancoso
50
0
0
02 Mar 2025
Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models
Ruta Binkyte
Ivaxi Sheth
Zhijing Jin
Mohammad Havaei
Bernhard Schölkopf
Mario Fritz
122
0
0
28 Feb 2025
DIN-CTS: Low-Complexity Depthwise-Inception Neural Network with Contrastive Training Strategy for Deepfake Speech Detection
L. D. Pham
Dat Tran
Florian Skopik
Alexander Schindler
Silvia Poletti
Fischinger David
Martin Boyer
Martin Boyer
46
1
0
27 Feb 2025
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Jiaming Zhou
Yujie Guo
S. Zhao
Haoqin Sun
Hui Wang
...
Shiyao Wang
Xi Yang
Y. Wang
Yonghua Lin
Yong Qin
46
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
76
0
0
26 Feb 2025
Steganography Beyond Space-Time with Chain of Multimodal AI
Ching-Chun Chang
Isao Echizen
71
0
0
25 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
32
2
0
24 Feb 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
60
0
0
24 Feb 2025
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
Wei Zhao
Pengxiang Ding
M. Zhang
Zhefei Gong
Shuanghao Bai
H. Zhao
Donglin Wang
85
6
0
24 Feb 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMe
VLM
143
0
0
24 Feb 2025
ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing
Suhyeon Yoo
Khai-Nghi Truong
Young-Ho Kim
53
0
0
24 Feb 2025
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Hyunji Lee
Danni Liu
Supriti Sinhamahapatra
Jan Niehues
106
0
0
21 Feb 2025
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
G. Klambauer
Razvan Pascanu
Sepp Hochreiter
75
4
0
21 Feb 2025
A Dual-Stage Time-Context Network for Speech-Based Alzheimer's Disease Detection
Yifan Gao
Long Guo
Hong Liu
93
0
0
18 Feb 2025
SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion
Junxian Ma
Shiwen Wang
Jian Yang
Junyi Hu
Jian Liang
Guosheng Lin
Jingbo Chen
Kai Li
Yu Meng
DiffM
VGen
61
3
0
17 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
3
0
17 Feb 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
T. Zhao
M. Yang
Mamba
AuLLM
66
0
0
16 Feb 2025
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues
David Sasu
Zehui Wu
Ziwei Gong
Run Chen
Pengyuan Shi
Lin Ai
Julia Hirschberg
Natalie Schluter
58
1
0
16 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
59
2
0
07 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
160
0
0
05 Feb 2025
Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models
A. Benazir
Felix Xiaozhu Lin
41
0
0
29 Jan 2025
Experimenting with Affective Computing Models in Video Interviews with Spanish-speaking Older Adults
Josep Lopez Camunas
Cristina Bustos
Yanjun Zhu
Raquel Ros
Àgata Lapedriza
48
0
0
28 Jan 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
106
5
0
28 Jan 2025
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
Sami Baral
L. Lucy
Ryan Knight
Alice Ng
Luca Soldaini
Neil T. Heffernan
Kyle Lo
44
3
0
28 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
72
10
0
28 Jan 2025
People are poorly equipped to detect AI-powered voice clones
Sarah Barrington
Emily A. Cooper
Hany Farid
56
6
0
28 Jan 2025
Speech Translation Refinement using Large Language Models
Huaixia Dou
Xinyu Tian
Xinglin Lyu
Jie Zhu
Junhui Li
Lifan Guo
137
0
0
28 Jan 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
56
4
0
28 Jan 2025
Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer
Hu Hu
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Chin-Hui Lee
65
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
69
4
0
24 Jan 2025
Collective Memory and Narrative Cohesion: A Computational Study of Palestinian Refugee Oral Histories in Lebanon
Ghadeer Awwad
Lavinia Dunagan
David Gamba
Tamara N. Rayan
36
0
0
23 Jan 2025
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
Jae-Sung Bae
Anastasia Kuznetsova
Dinesh Manocha
John Hershey
Trausti Kristjansson
Minje Kim
72
0
0
23 Jan 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
87
0
0
22 Jan 2025
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
Minh Tran
Yutong Pang
Debjyoti Paul
Laxmi Pandey
Kevin Jiang
Jinxi Guo
Ke Li
Shun Zhang
X. Zhang
Xin Lei
AI4CE
36
0
0
21 Jan 2025
Previous
1
2
3
4
5
...
8
9
10
Next