Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.08202
Cited By
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
10 October 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training"
20 / 20 papers shown
Title
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
M. Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
54
2
0
04 May 2025
V
2
^2
2
R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
28
0
0
23 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
X. Li
Zilong Huang
Y. Li
Weixian Lei
XueQing Deng
Shihao Chen
S. Ji
Jiashi Feng
MLLM
LRM
40
1
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
19
1
0
14 Apr 2025
OmniCaptioner: One Captioner to Rule Them All
Yiting Lu
Jiakang Yuan
Zhen Li
Shitian Zhao
Qi Qin
...
Lei Bai
Zhibo Chen
Peng Gao
Bo Zhang
Peng Gao
MLLM
76
0
0
09 Apr 2025
Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation
Xiaoxing Hu
Ziyang Gong
Y. Wang
Yuru Jia
Gen Luo
Xue Yang
26
0
0
08 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
S. Chen
Jingjing Chen
Lin Ma
Yu Jiang
23
2
0
06 Apr 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
Xiangyu Zhao
Peiyuan Zhang
Kexian Tang
Hao Li
Zicheng Zhang
Guangtao Zhai
Junchi Yan
Hua Yang
Xue Yang
Haodong Duan
VLM
LRM
36
0
0
03 Apr 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
69
1
0
27 Mar 2025
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
81
0
0
26 Mar 2025
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang
Yanjiang Liu
Ben He
Y. Lu
Hongyu Lin
Jia Zheng
Xianpei Han
Le Sun
Yingfei Sun
39
0
0
23 Mar 2025
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu
Z. Wang
C. Zhang
Peng Li
Yang Liu
CoGe
VLM
63
0
0
18 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
47
0
0
16 Mar 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Y. Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
86
3
0
13 Mar 2025
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Huilin Deng
Ding Zou
Rui Ma
Hongchen Luo
Yang Cao
Yu Kang
LRM
VLM
44
8
0
10 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
X. J. Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
57
0
0
10 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
68
0
0
08 Mar 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
81
6
0
25 Feb 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
44
45
1
15 Nov 2024
γ
−
γ-
γ
−
MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo
Gen Luo
Jiayi Ji
Yiyi Zhou
Xiaoshuai Sun
Zhiqiang Shen
Rongrong Ji
VLM
MoE
24
1
0
17 Oct 2024
1