Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025 |
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit CooperationComputer Vision and Pattern Recognition (CVPR), 2025 |