Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025 |
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024 |
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web TasksInternational Conference on Learning Representations (ICLR), 2024 |