ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.18265
  4. Cited By
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
v1v2 (latest)

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

25 August 2025
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
Xingguang Wei
Zhaoyang Liu
Linglin Jing
Shenglong Ye
Jie Shao
Zhaokai Wang
Z. Chen
Hongjie Zhang
Ganlin Yang
Haomin Wang
Qi Wei
Jinhui Yin
Wenhao Li
Erfei Cui
Guanzhou Chen
Zichen Ding
Changyao Tian
Z. Wu
JingJing Xie
Zehao Li
Bowen Yang
Yuchen Duan
Xuehui Wang
Zhi Hou
Haoran Hao
Tianyi Zhang
Songze Li
Xiangyu Zhao
Haodong Duan
Nianchen Deng
Bin-Bin Fu
Yinan He
Yi Wang
Conghui He
Botian Shi
Junjun He
Yingtong Xiong
Han Lv
Lijun Wu
Wenqi Shao
Kaipeng Zhang
Huipeng Deng
Biqing Qi
J. Ge
Qipeng Guo
Wenwei Zhang
Songyang Zhang
Maosong Cao
J. Lin
Kexian Tang
Jianfei Gao
Haian Huang
Yuzhe Gu
Chengqi Lyu
Huanze Tang
Rui Wang
Haijun Lv
Xuming He
Limin Wang
Min Dou
Xizhou Zhu
Tong Lu
Dahua Lin
Jifeng Dai
Weijie Su
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
    MLLMLRM
ArXiv (abs)PDFHTMLHuggingFace (169 upvotes)Github (9043★)

Papers citing "InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency"

50 / 104 papers shown
Title
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
168
2
0
15 Oct 2025
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui
Weiyun Wang
Jie Shao
Zichen Wen
Gen Luo
Linfeng Zhang
Y. Zhang
Yu Qiao
Wenhai Wang
VLM
72
0
0
14 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
117
0
0
14 Oct 2025
Detect Anything via Next Point Prediction
Detect Anything via Next Point Prediction
Qing Jiang
Junan Huo
Xingyu Chen
Yuda Xiong
Zhaoyang Zeng
Yihao Chen
Tianhe Ren
Junzhi Yu
Lei Zhang
ObjD
154
4
0
14 Oct 2025
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding
Yifeng Yao
Yike Yun
Jing Wang
Huishuai Zhang
Dongyan Zhao
Ke Tian
Zhihao Wang
Minghui Qiu
Tao Wang
CLIPVGen
36
1
0
14 Oct 2025
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Hongxiang Li
Yaowei Li
Bin Lin
Yuwei Niu
Yuhang Yang
Xiaoshuang Huang
Jiayin Cai
Xiaolong Jiang
Yao Hu
Long Chen
EGVMLRM
72
1
0
13 Oct 2025
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
Chengqi Duan
Kaiyue Sun
Rongyao Fang
M. Zhang
Yan Feng
...
Peng Pei
Xunliang Cai
Hongsheng Li
Yi Ma
Xihui Liu
ReLMOffRLLRM
99
2
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
137
3
0
13 Oct 2025
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
Yicheng Xu
Y. Wu
Jiashuo Yu
Ziang Yan
Tianxiang Jiang
...
Kai Chen
Yu Qiao
Limin Wang
Manabu Okumura
Y. Wang
LRM
72
0
0
13 Oct 2025
UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
Jianke Zhang
Yucheng Hu
Yanjiang Guo
Xiaoyu Chen
Yichen Liu
Wenna Chen
Chaochao Lu
Jianyu Chen
OffRLSSL
189
0
0
12 Oct 2025
Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMs
Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMs
SuYang Xi
Chenxi Yang
Hong Ding
Yiqing Ni
Catherine C. Liu
Yunhao Liu
Chengqi Zhang
LRM
49
0
0
12 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
159
1
0
12 Oct 2025
Learning from Mistakes: Enhancing Harmful Meme Detection via Misjudgment Risk Patterns
Learning from Mistakes: Enhancing Harmful Meme Detection via Misjudgment Risk Patterns
Wenshuo Wang
Ziyou Jiang
Junjie Wang
Mingyang Li
Jie Huang
Yuekai Huang
Zhiyuan Chang
Feiyan Duan
Qing Wang
59
0
0
10 Oct 2025
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Zixin Zhang
Kanghao Chen
Xingwang Lin
Lutao Jiang
Xu Zheng
Yuanhuiyi Lyu
Litao Guo
Yinchuan Li
Ying-Cong Chen
40
2
0
10 Oct 2025
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Peiran Wu
Zhuorui Yu
Yunze Liu
Chi-Hao Wu
Enmin Zhou
Junxiao Shen
OffRLVLM
56
0
0
09 Oct 2025
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Mansi Sakarvadia
Kareem Hegazy
A. Totounferoush
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
SupR
180
9
0
08 Oct 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
68
7
0
07 Oct 2025
OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
OneVision: An End-to-End Generative Framework for Multi-view E-commerce Vision Search
Zexin Zheng
Huangyu Dai
Lingtao Mao
Suhua Wang
Zihan Liang
...
Yuqing Ding
Chenyi Lei
Wenwu Ou
Han Li
Kun Gai
127
0
0
07 Oct 2025
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Mingyu Liu
Zheng Huang
Xiaoyi Lin
Huanyi Zheng
Canyu Zhao
Zongze Du
Y. Wang
Haoyi Zhu
Hao Chen
Chunhua Shen
73
0
0
04 Oct 2025
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Ming Zhao
Wenhui Dong
Yang Zhang
Xiang Zheng
Zhonghao Zhang
...
Zhiqiang Liu
Zhongan Bi
Chenyang Si
Tiansheng Sun
Caifeng Shan
LM&MAELMLRM
286
0
0
03 Oct 2025
TIT-Score: Evaluating Long-Prompt Based Text-to-Image Alignment via Text-to-Image-to-Text Consistency
TIT-Score: Evaluating Long-Prompt Based Text-to-Image Alignment via Text-to-Image-to-Text Consistency
Juntong Wang
Huiyu Duan
Jiarui Wang
Ziheng Jia
Guangtao Zhai
Xiongkuo Min
EGVMALMLM&MAVLM
108
1
0
03 Oct 2025
Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning
Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning
Xuchen Li
Xuzhao Li
Jiahui Gao
Renjie Pi
Shiyu Hu
Wentao Zhang
VLMLRM
170
2
0
02 Oct 2025
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
Sicheng Feng
Kaiwen Tuo
Song Wang
Lingdong Kong
Jianke Zhu
Huan Wang
LRM
112
1
0
02 Oct 2025
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
Krishna Teja Chitty-Venkata
M. Emani
MLLMVGenLRMVLM
141
1
0
02 Oct 2025
BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning
BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning
Ching-Huei Tsou
Michal Ozery-Flato
Ella Barkan
Diwakar Mahajan
Ben Shapira
AI4CE
44
0
0
01 Oct 2025
Plug-and-Play Prompt Refinement via Latent Feedback for Diffusion Model Alignment
Plug-and-Play Prompt Refinement via Latent Feedback for Diffusion Model Alignment
Suhyeon Lee
Jong Chul Ye
105
0
0
01 Oct 2025
Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction
Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction
Sagnik Basu
Shubham Prakash
Ashish Maruti Barge
Siddharth D. Jaiswal
A. Dash
Saptarshi Ghosh
Animesh Mukherjee
60
0
0
30 Sep 2025
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Yuansen Liu
Haiming Tang
Jinlong Peng
Jiangning Zhang
Xiaozhong Ji
...
Chaoyou Fu
Chengjie Wang
Chengjie Wang
Xiaobin Hu
Shuicheng Yan
VLM
121
0
0
30 Sep 2025
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
Peng Liu
H. Shen
Chunxin Fang
Zhicheng Sun
Jiajia Liao
T. Zhao
MLLMObjDVLMLRM
145
1
0
30 Sep 2025
Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
Harold Haodong Chen
Xianfeng Wu
Wen-Jie Shu
Rongjin Guo
Disen Lan
Harry Yang
Ying-Cong Chen
56
1
0
30 Sep 2025
Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models
Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models
Youngeun Kim
Youjia Zhang
Huiling Liu
Aecheon Jung
Sunwoo Lee
Sungeun Hong
VLM
75
0
0
29 Sep 2025
Latent Visual Reasoning
Latent Visual Reasoning
Bangzheng Li
Ximeng Sun
Jiang-Long Liu
Ze Wang
Jialian Wu
Xiaodong Yu
Hao Chen
Emad Barsoum
Muhao Chen
Zicheng Liu
LRMVLM
136
2
0
29 Sep 2025
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides Understanding
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides Understanding
Jiancong Xie
Wenjin Wang
Zhuomeng Zhang
Zihan Liu
Qi Liu
Ke Feng
Zixun Sun
Yuedong Yang
VLM
29
0
0
29 Sep 2025
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Congzhi Zhang
Zhibin Wang
Yinchao Ma
Jiawei Peng
Y. Wang
Qiang Zhou
Jun Song
Bo Zheng
OffRLAI4TSLRM
150
2
0
28 Sep 2025
GUI-PRA: Process Reward Agent for GUI Tasks
GUI-PRA: Process Reward Agent for GUI Tasks
Tao Xiong
Xavier Hu
Yurun Chen
Yuhang Liu
Changqiao Wu
Pengzhi Gao
Wei Liu
Jian Luan
Shengyu Zhang
LLMAG
133
0
0
27 Sep 2025
SciTS: Scientific Time Series Understanding and Generation with LLMs
SciTS: Scientific Time Series Understanding and Generation with LLMs
Wen Wu
Ziyang Zhang
Liwei Liu
Xuenan Xu
J. Liu
...
Siyuan Hou
T. Lin
Kai Chen
Bowen Zhou
C. Zhang
AI4TS
84
0
0
26 Sep 2025
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation
Ruoyu Chen
Xiaoqing Guo
Kangwei Liu
Siyuan Liang
Shiming Liu
Qunli Zhang
Hua Zhang
Xiaochun Cao
104
0
0
26 Sep 2025
Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning
Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning
Guoxin Wang
Jun Zhao
Xinyi Liu
Yanbo Liu
Xuyang Cao
...
Zhuoyun Liu
Qintian Sun
Fangru Zhou
Haoqiang Xing
Zhenhong Yang
LRM
106
1
0
23 Sep 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
S. Yu
Yuxin Chen
Hao Ju
Lianjie Jia
Fuxi Zhang
...
Lin Song
Lijun Wang
Yanwei Li
Y. Shan
Huchuan Lu
LRM
225
5
0
23 Sep 2025
MAPO: Mixed Advantage Policy Optimization
MAPO: Mixed Advantage Policy Optimization
Wenke Huang
Quan Zhang
Yiyang Fang
Jian Liang
Xuankun Rong
...
Mingjun Li
Leszek Rutkowski
Mang Ye
Bo Du
Dacheng Tao
151
2
0
23 Sep 2025
Vision Language Models Are Not (Yet) Spelling Correctors
Vision Language Models Are Not (Yet) Spelling Correctors
Junhong Liang
Bojun Zhang
VLM
32
0
0
22 Sep 2025
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Quanzhu Niu
Dengxian Gong
Shihao Chen
Tao Zhang
Yikang Zhou
Haobo Yuan
Lu Qi
Xiangtai Li
Shilin Xu
VOS
172
0
0
21 Sep 2025
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Yanghao Li
Rui Qian
Bowen Pan
Haotian Zhang
Haoshuo Huang
...
Zhengdong Zhang
Chen Chen
Yang Zhao
Ruoming Pang
Zhifeng Chen
MLLM
156
3
0
19 Sep 2025
GenExam: A Multidisciplinary Text-to-Image Exam
GenExam: A Multidisciplinary Text-to-Image Exam
Zhaokai Wang
Penghao Yin
Xiangyu Zhao
Changyao Tian
Yu Qiao
Wenhai Wang
Jifeng Dai
Gen Luo
ELM
163
0
0
17 Sep 2025
SAIL-VL2 Technical Report
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRMVLM
152
2
0
17 Sep 2025
Igniting VLMs toward the Embodied Space
Igniting VLMs toward the Embodied Space
Andy Zhai
B. Liu
Bruno Fang
Chalse Cai
Ellie Ma
...
Shalfun Li
Starrick Liu
S. Chen
Vincent Chen
Zach Xu
LM&RoVLM
95
5
0
15 Sep 2025
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen
Y. Liu
Yi Huang
Hao Wang
Miren Tian
Ya-Qi Yu
Minghui Liao
Jihao Wu
MLLMVLM
211
0
0
15 Sep 2025
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
Jing Huang
Zhiya Tan
Shutao Gong
Fanwei Zeng
Jianshu Li
Jianshu Li
Huazhe Tan
Weibin Yao
J. Li
MLLMLRM
128
0
0
12 Sep 2025
GLEAM: Learning to Match and Explain in Cross-View Geo-Localization
GLEAM: Learning to Match and Explain in Cross-View Geo-Localization
Xudong Lu
Zhi Zheng
Yi Wan
Yongxiang Yao
Annan Wang
...
Weifeng Lin
Xiangyu Zhao
Xue Yang
Xue Yang
Hongsheng Li
97
1
0
09 Sep 2025
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Han Li
Xinyu Peng
Y. Wang
Zelin Peng
Xin Chen
Rongxiang Weng
Jingang Wang
Xunliang Cai
Wenrui Dai
Hongkai Xiong
MLLMOffRL
206
7
0
03 Sep 2025
Previous
123
Next