TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025 |
Aligned Better, Listen Better for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025 |
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?Computer Vision and Pattern Recognition (CVPR), 2025 |
Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025 |
M3: 3D-Spatial MultiModal MemoryInternational Conference on Learning Representations (ICLR), 2025 |
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025 |