Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid InferenceAAAI Conference on Artificial Intelligence (AAAI), 2024 |
DriveLM: Driving with Graph Visual Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023 |
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual GranularityAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
NAVERO: Unlocking Fine-Grained Semantics for Video-Language
Compositionality Chaofan Tao Gukyeong Kwon Varad Gunjal Hao Yang Zhaowei Cai Yonatan Dukler Ashwin Swaminathan R. Manmatha Colin Jon Taylor Stefano Soatto |