
Title |
|---|
![]() Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() Recent Advances in Speech Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?International Conference on Learning Representations (ICLR), 2024 Yi-Fan Zhang Huanyu Zhang Haochen Tian Chaoyou Fu Shuangqing Zhang ...Qingsong Wen Zhang Zhang Liwen Wang Rong Jin Tieniu Tan |