Title |
---|
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() Large Language Models are Strong Audio-Visual Speech Recognition Learners Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis Daniele Falavigna Alessio Brutti Maja Pantic |