Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.08487
Cited By
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
12 June 2024
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models"
11 / 11 papers shown
Title
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
J. Chen
Fan Yang
Z. Zhang
Tingting Gao
Liang Wang
OffRL
LRM
34
0
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Y. Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Y. Yang
Qi Zhang
Lili Qiu
VLM
70
0
0
30 Apr 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
40
0
0
25 Apr 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
X. J. Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
62
0
0
10 Mar 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Li Cao
Liqiang Nie
VLM
70
6
0
29 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
158
0
0
18 Dec 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
L. Wang
Rong Jin
Tieniu Tan
OffRL
48
35
0
23 Aug 2024
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
1