Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04356
Cited By
Robust Speech Recognition via Large-Scale Weak Supervision
6 December 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Speech Recognition via Large-Scale Weak Supervision"
50 / 454 papers shown
Title
BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks
Zhuang Li
48
1
0
21 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
41
0
0
20 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
33
2
0
20 Jan 2025
Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
Karl El Hajal
Enno Hermann
Ajinkya Kulkarni
Mathew Magimai.-Doss
31
0
0
20 Jan 2025
Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture
Oliver Chojnowski
Alexander Eberhard
Michael Schiffmann
Ana Müller
Anja Richert
AI4CE
34
0
0
18 Jan 2025
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Matthew Wiesner
Sanjeev Khudanpur
J. Černocký
L. Burget
99
1
0
17 Jan 2025
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
33
5
0
17 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
J. Zhang
Lu Lu
Y. Wang
Haizhou Li
Z. Wu
AuLLM
80
17
0
17 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
61
0
0
15 Jan 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya V Ganesan
O. Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. A. Schwartz
32
0
0
15 Jan 2025
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
C. Jacobs
Annelien Smith
Daleen Klop
Ondřej Klejch
Febe de Wet
Herman Kamper
49
0
0
11 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
Marco Giordano
Claudia Rinaldi
41
0
0
11 Jan 2025
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano
Sabit Hassan
P. Sharma
Anthony Sicilia
Katherine Atwell
Diane Litman
Malihe Alikhani
39
0
0
10 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
31
0
0
04 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
52
3
0
03 Jan 2025
Fotheidil: an Automatic Transcription System for the Irish Language
Liam Lonergan
Ibon Saratxaga
John Sloan
Oscar Maharog
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
24
0
0
03 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
82
3
0
03 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
49
0
0
31 Dec 2024
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar
Ishan D. Biyani
Nirmesh J. Shah
Pankaj Wasnik
R. Shah
DiffM
26
0
0
31 Dec 2024
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
Ruohong Yang
Peng Hu
Xi Peng
Xiting Liu
Yunfan Li
34
0
0
25 Dec 2024
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
85
0
0
16 Dec 2024
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
A. Hengel
Jian Yang
Qingming Huang
90
6
0
12 Dec 2024
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Thai-Binh Nguyen
Alexander Waibel
74
1
0
27 Nov 2024
AMPS: ASR with Multimodal Paraphrase Supervision
Amruta Parulekar
Abhishek Gupta
Sameep Chattopadhyay
P. Jyothi
75
0
0
27 Nov 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
119
1
0
22 Nov 2024
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham
H. Kim
Sangmin Woo
Changick Kim
Mamba
174
0
0
21 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
140
2
0
11 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
54
1
0
03 Nov 2024
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
43
5
0
24 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
56
11
0
23 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
54
2
0
23 Oct 2024
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
X. Sun
Yu Cheng
Zhanhui Kang
AuLLM
CLL
55
2
0
22 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
65
3
0
20 Oct 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
33
2
0
15 Oct 2024
A Framework for Adapting Human-Robot Interaction to Diverse User Groups
Theresa Pekarek-Rosin
Vanessa Hassouna
Xiaowen Sun
Luca Krohm
Henri-Leon Kordt
Michael Beetz
Stefan Wermter
28
0
0
15 Oct 2024
Characterizing the MrDeepFakes Sexual Deepfake Marketplace
Catherine Han
Anne Li
Deepak Kumar
Zakir Durumeric
24
1
0
14 Oct 2024
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa
Dietrich Klakow
Mariya Toneva
44
3
0
11 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
120
2
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
32
0
0
09 Oct 2024
Organizing Unstructured Image Collections using Natural Language
Mingxuan Liu
Zhun Zhong
Jun Li
Gianni Franchi
Subhankar Roy
Elisa Ricci
VLM
39
3
0
07 Oct 2024
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Farhan Samir
Emily P. Ahn
Shreya Prakash
Márton Soskuthy
Vered Shwartz
Jian Zhu
26
0
0
05 Oct 2024
Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models
Pavel Stepachev
Pinzhen Chen
Barry Haddow
28
0
0
04 Oct 2024
Reverb: Open-Source ASR and Diarization from Rev
Nishchal Bhandari
Danny Chen
Miguel Ángel del Río Fernández
Natalie Delworth
Jennifer Drexler Fox
...
Ondrej Novotný
Jan Profant
Nan Qin
Martin Ratajczak
Jean-Philippe Robichaud
VLM
31
1
0
04 Oct 2024
Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models
Hongbin Liu
Lun Wang
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
26
0
0
02 Oct 2024
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li
Wendi Sang
Chang Zeng
Runxuan Yang
Guo Chen
Xiaolin Hu
26
2
0
02 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
31
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Probing mental health information in speech foundation models
Marc de Gennes
Adrien Lesage
Martin Denais
Xuan-Nga Cao
Simon Chang
Pierre Van Remoortere
Cyrille Dakhlia
Rachid Riad
21
0
0
27 Sep 2024
Previous
1
2
3
4
5
6
...
8
9
10
Next