Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2508.18265
Cited By
v1
v2 (latest)
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
25 August 2025
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
Xingguang Wei
Zhaoyang Liu
Linglin Jing
Shenglong Ye
Jie Shao
Zhaokai Wang
Z. Chen
Hongjie Zhang
Ganlin Yang
Haomin Wang
Qi Wei
Jinhui Yin
Wenhao Li
Erfei Cui
Guanzhou Chen
Zichen Ding
Changyao Tian
Z. Wu
JingJing Xie
Zehao Li
Bowen Yang
Yuchen Duan
Xuehui Wang
Zhi Hou
Haoran Hao
Tianyi Zhang
Songze Li
Xiangyu Zhao
Haodong Duan
Nianchen Deng
Bin-Bin Fu
Yinan He
Yi Wang
Conghui He
Botian Shi
Junjun He
Yingtong Xiong
Han Lv
Lijun Wu
Wenqi Shao
Kaipeng Zhang
Huipeng Deng
Biqing Qi
J. Ge
Qipeng Guo
Wenwei Zhang
Songyang Zhang
Maosong Cao
J. Lin
Kexian Tang
Jianfei Gao
Haian Huang
Yuzhe Gu
Chengqi Lyu
Huanze Tang
Rui Wang
Haijun Lv
Xuming He
Limin Wang
Min Dou
Xizhou Zhu
Tong Lu
Dahua Lin
Jifeng Dai
Weijie Su
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (169 upvotes)
Github (9043★)
Papers citing
"InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency"
50 / 104 papers shown
Title
DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation
Yongkun Du
Pinxuan Chen
Xuye Ying
Z. Chen
16
0
0
23 Nov 2025
EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning
Yogesh Kulkarni
Pooyan Fazli
EgoV
LRM
121
0
0
23 Nov 2025
FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning
Guoyang Xia
Yifeng Ding
Fengfa Li
Lei Ren
Wei Chen
Fangxiang Feng
Xiaojie Wang
MoE
VLM
44
0
0
22 Nov 2025
VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment
Ziheng Jia
Linhan Cao
J. N. Han
Zicheng Zhang
Jiaying Qian
Jiarui Wang
Z. Chen
Guangtao Zhai
Xiongkuo Min
MLLM
16
0
0
22 Nov 2025
Understanding Counting Mechanisms in Large Language and Vision-Language Models
Hosein Hasani
Amirmohammad Izadi
Fatemeh Askari
Mobin Bagherian
Sadegh Mohammadian
Mohammad Izadi
M. Baghshah
12
0
0
21 Nov 2025
When to Think and When to Look: Uncertainty-Guided Lookback
Jing Bi
Filippos Bellos
Junjia Guo
Yayuan Li
Chao Huang
...
Tang
Luchuan Song
Susan Liang
Zhongfei
Zhang
LRM
141
0
0
19 Nov 2025
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Yushi Huang
Z. Wang
Zhihang Yuan
Yifu Ding
Ruihao Gong
Jinyang Guo
Xianglong Liu
Jun Zhang
MoE
VLM
72
0
0
19 Nov 2025
Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov
Ulyana Isaeva
Anton A. Emelyanov
Artem Safin
Maria Tikhonova
...
Ilseyar Alimova
Ilseyar Alimova
A. Kapitanov
Alena Fenogenova
Alena Fenogenova
142
0
0
19 Nov 2025
FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing
Junhao Gong
Shoujie Li
Kit-Wa Sou
Changqing Guo
Hourong Huang
...
Yifan Xie
Chenxin Liang
Chuqiao Lyu
Xiaojun Liang
Wenbo Ding
56
0
0
18 Nov 2025
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
124
0
0
18 Nov 2025
Can We Predict the Next Question? A Collaborative Filtering Approach to Modeling User Behavior
Bokang Fu
Jiahao Wang
Xiaojing Liu
Y. Liu
112
0
0
17 Nov 2025
Minimax Multi-Target Conformal Prediction with Applications to Imaging Inverse Problems
Jeffrey Wen
Rizwan Ahmad
Philip Schniter
MedIm
179
0
0
17 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLM
MoE
OSLM
VLM
322
0
0
16 Nov 2025
How Do VLAs Effectively Inherit from VLMs?
Chuheng Zhang
Rushuai Yang
Xiaoyu Chen
Kaixin Wang
Li Zhao
Yi-Ling Chen
Jiang Bian
LM&Ro
202
0
0
10 Nov 2025
V-Thinker: Interactive Thinking with Images
Runqi Qiao
Qiuna Tan
Minghan Yang
Guanting Dong
Peiqing Yang
...
Yida Xu
Lan Yang
Chong Sun
Chen Li
Honggang Zhang
MLLM
LRM
249
0
0
06 Nov 2025
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Kevin Qinghong Lin
Y. Zheng
Hangyu Ran
Dantong Zhu
Dongxing Mao
Linjie Li
Philip Torr
Alex Jinpeng Wang
36
0
0
04 Nov 2025
Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers
Mohamed Eltahir
Ali Habibullah
Lama Ayash
Tanveer Hussain
Naeemullah Khan
56
0
0
03 Nov 2025
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Shengqi Xu
Xinpeng Zhou
Y. Zhang
Ming-Yu Liu
Tao Liang
Tianyu Zhang
Yalong Bai
Zuxuan Wu
W. Zuo
108
0
0
01 Nov 2025
VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning
Xuanle Zhao
Deyang Jiang
Zhixiong Zeng
Lei Chen
Haibo Qiu
Jing Huang
Yufeng Zhong
Liming Zheng
Yilin Cao
Lin Ma
57
2
0
01 Nov 2025
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
Fan Zhang
Haoxuan Li
Shengju Qian
Xin Wang
Zheng Lian
...
Yuan Gao
Qiankun Li
Yefeng Zheng
Zhouchen Lin
Pheng-Ann Heng
LRM
56
0
0
01 Nov 2025
RzenEmbed: Towards Comprehensive Multimodal Retrieval
Weijian Jian
Yajun Zhang
Dawei Liang
Chunyu Xie
Yixiao He
Dawei Leng
Yuhui Yin
61
0
0
31 Oct 2025
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Jiawei Gu
Yunzhuo Hao
Huichen Will Wang
Linjie Li
Michael Qizhe Shieh
Yejin Choi
Ranjay Krishna
Yu Cheng
LM&Ro
LRM
237
1
0
30 Oct 2025
EgoExo-Con: Exploring View-Invariant Video Temporal Understanding
Minjoon Jung
Junbin Xiao
Junghyun Kim
Byoung-Tak Zhang
Angela Yao
68
1
0
30 Oct 2025
A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics
Simindokht Jahangard
Mehrzad Mohammadi
Abhinav Dhall
Hamid Rezatofighi
31
0
0
30 Oct 2025
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Jiaqi Wang
X. J. Yang
Kai Sun
Parth Suresh
Sanat Sharma
...
Rakesh Wanga
Anuj Kumar
Rohit Patel
Wen-tau Yih
Xin Luna Dong
68
0
0
30 Oct 2025
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
Nikita Kachaev
Mikhail Kolosov
Daniil Zelezetsky
A. Kovalev
Aleksandr I. Panov
VLM
225
2
0
29 Oct 2025
ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents
Tianyu Yang
Terry Ruas
Yijun Tian
Jan Philip Wahle
Daniel Kurzawe
Bela Gipp
VLM
184
0
0
29 Oct 2025
MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Xiaoke Huang
Ningsen Wang
Hui Liu
Xianfeng Tang
Yuyin Zhou
LM&MA
MedIm
156
0
0
29 Oct 2025
Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning
Aodi Wu
Xubo Luo
36
0
0
28 Oct 2025
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
Qiushi Sun
Mukai Li
Zhoumianze Liu
Zhihui Xie
F. Xu
...
Qi Liu
Z. Wu
Zhuosheng Zhang
B. Kao
Lingpeng Kong
36
0
0
28 Oct 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
124
2
0
28 Oct 2025
Conflict Adaptation in Vision-Language Models
Xiaoyang Hu
41
0
0
28 Oct 2025
FruitProm: Probabilistic Maturity Estimation and Detection of Fruits and Vegetables
Sidharth Rai
Rahul Harsha Cheppally
Benjamin Vail
Keziban Yalçın Dokumacı
Ajay Sharda
49
0
0
28 Oct 2025
Rethinking the Text-Vision Reasoning Imbalance in MLLMs through the Lens of Training Recipes
Guanyu Yao
Qiucheng Wu
Yang Zhang
Zhaowen Wang
Handong Zhao
Shiyu Chang
VLM
LRM
179
0
0
26 Oct 2025
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Guanghao Zheng
Bowen Shi
Mingxing Xu
Ruoyu Sun
Peisen Zhao
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Xiaopeng Zhang
Qi Tian
VLM
87
0
0
23 Oct 2025
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Y. Liu
Lianhui Qin
Shengjie Wang
LRM
52
0
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
84
0
0
22 Oct 2025
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
Ziang Zhang
Zehan Wang
Guanghao Zhang
Weilong Dai
Yan Xia
Ziang Yan
Minjie Hong
Zhou Zhao
40
3
0
21 Oct 2025
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
Da Zhang
Chenggang Rong
Bingyu Li
Feiyu Wang
Zhiyuan Zhao
Junyu Gao
Xuelong Li
VLM
CoGe
123
0
0
21 Oct 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Haochen Wang
Yuhao Wang
Tao Zhang
Yikang Zhou
Yanwei Li
...
Anran Wang
Yunhai Tong
Z. Wang
X. Li
Zhaoxiang Zhang
VLM
129
0
0
21 Oct 2025
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
Xueyi Chen
Keda Tao
Kele Shao
Huan Wang
92
1
0
21 Oct 2025
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
Yaning Pan
Z. Wang
Qianqian Xie
Yongqian Wen
Y. Zhang
...
Zhidong Gan
Yonghong Lin
An Ping
Tianhao Peng
Jiaheng Liu
121
2
0
20 Oct 2025
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki
Junxian Guo
Jiaming Tang
Shang Yang
Yukang Chen
Konstantinos N. Plataniotis
Yao Lu
Song Han
Zhijian Liu
MLLM
VLM
101
1
0
20 Oct 2025
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
Zhining Liu
Ziyi Chen
Hui Liu
Chen Luo
Xianfeng Tang
...
Zhenwei Dai
Zhan Shi
Tianxin Wei
Benoit Dumoulin
Hanghang Tong
LRM
78
0
0
20 Oct 2025
LongInsightBench: A Comprehensive Benchmark for Evaluating Omni-Modal Models on Human-Centric Long-Video Understanding
Zhaoyang Han
Qihan Lin
Hao Liang
Bowen Chen
Zhou Liu
Wentao Zhang
VLM
67
0
0
20 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
40
0
0
17 Oct 2025
When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs
Hongcheng Liu
Pingjie Wang
Yuhao Wang
Siqu Ou
Yanfeng Wang
Y Samuel Wang
LRM
77
0
0
17 Oct 2025
Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning
Xiangyu Meng
Zixian Zhang
Zhenghao Zhang
Junchao Liao
Long Qin
Weizhi Wang
VGen
103
1
0
16 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
88
0
0
16 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
168
2
0
15 Oct 2025
1
2
3
Next