Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.13979
Cited By
Unsupervised Cross-lingual Representation Learning for Speech Recognition
24 June 2020
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unsupervised Cross-lingual Representation Learning for Speech Recognition"
50 / 402 papers shown
Title
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach
Ruikun Hou
B. Bühler
Tim Fütterer
Efe Bozkir
Peter Gerjets
Ulrich Trautwein
Enkelejda Kasneci
26
0
0
12 May 2025
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
Abdulhady Abas Abdullah
S. H. Karim
Sara Azad Ahmed
Kanar R. Tariq
Tarik Ahmed Rashid
147
0
0
23 Apr 2025
Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation
Tianshui Chen
Jianman Lin
Zhijing Yang
Chumei Qing
Yukai Shi
Liang Lin
39
2
0
08 Apr 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
37
0
0
30 Mar 2025
Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement
Hakyung Sung
Gyu-Ho Shin
48
0
0
18 Mar 2025
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen
Haoxian Zhang
Shoulong Zhang
Xiaoqiang Liu
Sisi Zhuang
Yuan Zhang
Pengfei Wan
Di Zhang
Shuai Li
54
1
0
14 Mar 2025
Linguistic Knowledge Transfer Learning for Speech Enhancement
Kuo-Hsuan Hung
Xugang Lu
Szu-Wei Fu
H. Tseng
Hsin-Yi Lin
Chii-Wann Lin
Yu Tsao
VLM
65
0
0
10 Mar 2025
I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue
E. Ghaleb
Bulat Khaertdinov
Aslı Özyürek
Raquel Fernández
36
0
0
27 Feb 2025
Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus
Golshid Shekoufandeh
Paul Boersma
Antal van den Bosch
50
0
0
24 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
55
2
0
05 Feb 2025
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Jiaming Zhou
S. Zhao
Hui Wang
Tian-Hao Zhang
Haoqin Sun
Xuechen Wang
Yong Qin
161
3
0
20 Jan 2025
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Minu Kim
Kangwook Jang
Hoirin Kim
39
0
0
12 Jan 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni
Liming Wang
Yang Zhang
Kaizhi Qian
Heting Gao
Mark Hasegawa-Johnson
Chang-Dong Yoo
SSL
OffRL
86
0
0
10 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
26
0
0
03 Jan 2025
Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Jiahui Zhao
Hao Shi
Chenrui Cui
Tianrui Wang
Hexin Liu
Zhaoheng Ni
Lingxuan Ye
Longbiao Wang
72
0
0
21 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
70
0
0
19 Dec 2024
Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Yujie Chen
Jiangyan Yi
Cunhang Fan
J. Tao
Yong Ren
...
Hao Gu
Jun Xue
Chenglong Wang
Zhao Lv
Xiaohui Zhang
78
0
0
16 Dec 2024
Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition
Han Zhu
Gaofeng Cheng
Qingwei Zhao
Pengyuan Zhang
VLM
78
0
0
15 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
61
0
0
07 Dec 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
59
0
0
18 Nov 2024
Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages
Leena G Pillai
Kavya Manohar
Basil K Raju
Elizabeth Sherly
37
0
0
07 Nov 2024
Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO
Macarious Hui
Jinda Zhang
Aanchan Mohan
26
0
0
01 Nov 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
31
0
0
30 Oct 2024
Advocating Character Error Rate for Multilingual ASR Evaluation
Thennal D K
Jesin James
D. Gopinath
Muhammed Ashraf K
19
1
0
09 Oct 2024
Personal Intelligence System UniLM: Hybrid On-Device Small Language Model and Server-Based Large Language Model for Malay Nusantara
Azree Nazri
Olalekan Agbolade
Faisal Aziz
25
0
0
09 Oct 2024
CALoR: Towards Comprehensive Model Inversion Defense
Hongyao Yu
Yixiang Qiu
Hao Fang
Bin Chen
Sijin Yu
Bin Wang
Shu-Tao Xia
Ke Xu
27
1
0
08 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
18
0
0
07 Oct 2024
Cross-Lingual Query-by-Example Spoken Term Detection: A Transformer-Based Approach
Allahdadi Fatemeh
Mahdian Toroghi Rahil
Zareian Hassan
20
0
0
05 Oct 2024
Automatic Speech Recognition for the Ika Language
Uchenna Nzenwata
Daniel Ogbuigwe
VLM
23
0
0
01 Oct 2024
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
Nikola Ljubesic
Peter Rupnik
Danijel Koržinek
26
0
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Jozef Coldenhoff
Milos Cernak
33
0
0
21 Sep 2024
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
Khai Le-Duc
Phuc Phan
Tan-Hanh Pham
Bach Phan Tat
Minh-Huong Ngo
Chris Ngo
Thanh Nguyen-Tang
Truong Son-Hy
LM&MA
43
0
0
21 Sep 2024
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
Iuliia Thorbecke
Juan Zuluaga-Gomez
Esaú Villatoro-Tello
Shashi Kumar
Pradeep Rangappa
Sergio Burdisso
P. Motlícek
Karthik Pandia
A. Ganapathiraju
31
0
0
20 Sep 2024
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet Gündüz
Yunsu Kim
Kamer Ali Yuksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
33
0
0
19 Sep 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue
Wei Ren
Xuelong Geng
Kun Wei
Longhao Li
Qijie Shao
Linju Yang
Kai Diao
Lei Xie
AuLLM
23
3
0
17 Sep 2024
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Yao-Fei Cheng
Li-Wei Chen
Hung-Shin Lee
Hsin-Min Wang
21
0
0
13 Sep 2024
Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui
Daxin Tan
Yifan Yang
Dingdong Wang
Huimeng Wang
Xiao Chen
Xie Chen
Xunying Liu
28
1
0
13 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
68
1
0
13 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
37
2
0
12 Sep 2024
Graph Neural Networks for Parkinsons Disease Detection
S. A. Sheikh
Yacouba Kaloga
Md. Sahidullah
Ina Kodrasi
34
0
0
12 Sep 2024
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Wen-Chin Huang
Szu-Wei Fu
Erica Cooper
Ryandhimas E. Zezario
T. Toda
Hsin-Min Wang
Junichi Yamagishi
Yu Tsao
32
5
0
11 Sep 2024
A Large Dataset of Spontaneous Speech with the Accent Spoken in São Paulo for Automatic Speech Recognition Evaluation
Rodrigo Lima
S. Leal
Arnaldo Candido Junior
S. Aluísio
11
0
0
10 Sep 2024
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
Khai Le-Duc
Quy-Anh Dang
Tan-Hanh Pham
Truong Son-Hy
32
0
0
08 Aug 2024
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan
Jiaqi Li
Zhiqian Lin
Weiye Xiao
Lei Yang
CVBM
VGen
31
3
0
01 Aug 2024
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
26
2
0
23 Jul 2024
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
Yixiang Qiu
Hao Fang
Hongyao Yu
Bin Chen
Meikang Qiu
Shu-Tao Xia
AAML
39
11
0
18 Jul 2024
EmoFace: Audio-driven Emotional 3D Face Animation
Chang Liu
Qunfen Lin
Zijiao Zeng
Ye Pan
CVBM
38
4
0
17 Jul 2024
1
2
3
4
5
6
7
8
9
Next