Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
ConformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
Token Alignment via Character Matching for Subword CompletionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Beyond Text: Frozen Large Language Models in Visual Signal ComprehensionComputer Vision and Pattern Recognition (CVPR), 2024 |
Masked AutoDecoder is Effective Multi-Task Vision GeneralistComputer Vision and Pattern Recognition (CVPR), 2024 |
MAMMOTH: Massively Multilingual Modular Open Translation @ HelsinkiConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024 |
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting
ApplicationsThe Speaker and Language Recognition Workshop (Odyssey), 2024 |
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning
using ULMFiTACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2022 |
To Err Is Human, but Llamas Can Learn It TooConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
FFSTC: Fongbe to French Speech Translation CorpusInternational Conference on Language Resources and Evaluation (LREC), 2024 |
Cross-lingual Transfer or Machine Translation? On Data Augmentation for
Monolingual Semantic Textual SimilarityInternational Conference on Language Resources and Evaluation (LREC), 2024 |
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?International Conference on Learning Representations (ICLR), 2024 |