DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
TTS-Transducer: End-to-End Speech Synthesis with Neural TransducerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 |
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
How much do contextualized representations encode long-range context?North American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
Transducer Consistency Regularization for Speech to Text ApplicationsSpoken Language Technology Workshop (SLT), 2024 |
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASRInternational Conference on Learning Representations (ICLR), 2024 |
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple
SpeakersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient
Speaker-Adaptive Text-to-Speech via AutoguidanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
EMMeTT: Efficient Multimodal Machine Translation TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Piotr Żelasko Zhehuai Chen Mengru Wang Daniel Galvez Oleksii Hrinchuk Shuoyang Ding Ke Hu Jagadeesh Balam Vitaly Lavrukhin Boris Ginsburg |
Chain-of-Thought Prompting for Speech TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Ke Hu Zhehuai Chen Chao-Han Huck Yang Piotr Żelasko Oleksii Hrinchuk Vitaly Lavrukhin Jagadeesh Balam Boris Ginsburg |