Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Round and Round We Go! What makes Rotary Positional Encodings useful?International Conference on Learning Representations (ICLR), 2024 |