Title |
---|
![]() Video-to-Audio Generation with Fine-grained Temporal Semantics Yuchen Hu Yu Gu Chenxing Li Rilin Chen Dong Yu |
![]() Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu Yifan Peng Yichen Lu Xuankai Chang Ruihua Song Shinji Watanabe |
![]() Large Language Models are Strong Audio-Visual Speech Recognition Learners Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis Daniele Falavigna Alessio Brutti Maja Pantic |