InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
Handling Resolutions from 336 Pixels to 4K HDNeural Information Processing Systems (NeurIPS), 2024 |
What Are We Measuring When We Evaluate Large Vision-Language Models? An
Analysis of Latent Factors and BiasesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 |
ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024 |
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal
ReasoningACM Multimedia (MM), 2024 |
VL-ICL Bench: The Devil in the Details of Multimodal In-Context LearningInternational Conference on Learning Representations (ICLR), 2024 |
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesEuropean Conference on Computer Vision (ECCV), 2024 Ruyi Xu Yuan Yao Zonghao Guo Junbo Cui Zanlin Ni Chunjiang Ge Tat-Seng Chua Zhiyuan Liu Maosong Sun Gao Huang |