TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language ModelComputer Vision and Pattern Recognition (CVPR), 2025 |
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and ReactionComputer Vision and Pattern Recognition (CVPR), 2025 |
TempMe: Video Temporal Token Merging for Efficient Text-Video RetrievalInternational Conference on Learning Representations (ICLR), 2024 |