ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.11703
  4. Cited By
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

18 March 2024
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images"

50 / 87 papers shown
Title
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
51
0
0
05 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Y. Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Y. Yang
Qi Zhang
Lili Qiu
VLM
70
0
0
30 Apr 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
63
0
0
25 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
B. He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
S. Song
David Ouyang
James Y. Zou
LM&MA
45
0
0
19 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Y. Liu
Qi Wang
Fuzheng Zhang
MLLM
VLM
58
0
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Y. Liu
Qi Wang
Fuzheng Zhang
VLM
53
1
0
10 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
50
0
0
02 Apr 2025
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Jewon Lee
Ki-Ung Song
Seungmin Yang
Donguk Lim
Jaeyeon Kim
Wooksu Shin
Bo-Kyeong Kim
Yong Jae Lee
Tae-Ho Kim
VLM
55
0
0
01 Apr 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang
H. Wang
Mingshuo Chen
Di Wang
Yulin Wang
...
L. Lan
Wenjing Yang
J. Zhang
Zhiyuan Liu
Maosong Sun
52
2
0
31 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
39
0
0
27 Mar 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
108
0
0
26 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
90
0
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
42
1
0
26 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
62
0
0
25 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
84
7
0
16 Mar 2025
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo
Jiaqi Tang
Chenyi Huang
Feiyang Hao
Zhouhui Lian
VLM
56
0
0
13 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
X. J. Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
62
0
0
10 Mar 2025
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
M. Li
Rui Wang
Lei Sun
Y. Bai
Xiangxiang Chu
59
0
0
08 Mar 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
39
6
0
24 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
93
0
0
04 Feb 2025
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Ahmed Masry
Juan A. Rodriguez
Tianyu Zhang
Suyuchen Wang
Chao Wang
...
I. Laradji
David Vazquez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
51
0
0
03 Feb 2025
Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs
Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs
Hongliang Li
Jiaxin Zhang
Wenhui Liao
Dezhi Peng
Kai Ding
Lianwen Jin
OffRL
MQ
71
0
0
31 Jan 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
102
1
0
20 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Y. Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
81
0
0
18 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
  Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
88
11
0
02 Dec 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
105
6
0
27 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual
  Token Compression
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
64
8
0
21 Nov 2024
LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned
  Image Compression
LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression
Shimon Murai
Heming Sun
J. Katto
VLM
69
1
0
20 Nov 2024
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual
  Instruction Fine-Tuning
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu-Gang Jiang
VLM
OffRL
69
0
0
19 Nov 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment
  in Multi-Modal Models
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Wei Wang
Z. Li
Qi Xu
Linfeng Li
Yiqing Cai
Botian Jiang
Hang Song
Xingcan Hu
Pengyu Wang
Li Xiao
24
1
0
14 Nov 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
  Multi-document Understanding
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Mohit Bansal
VLM
20
8
0
07 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen
R. Zhang
Yufan Zhou
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
29
0
0
02 Nov 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
21
0
0
23 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Conghui He
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
45
26
0
22 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large
  Multimodal Models
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
43
1
0
21 Oct 2024
Improving Multi-modal Large Language Model through Boosting Vision
  Capabilities
Improving Multi-modal Large Language Model through Boosting Vision Capabilities
Yanpeng Sun
H. Zhang
Qiang Chen
Xinyu Zhang
Nong Sang
Gang Zhang
Jingdong Wang
Zechao Li
26
5
0
17 Oct 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
41
5
0
15 Oct 2024
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for
  Embodied AI
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
Sijie Cheng
Kechen Fang
Yangyang Yu
Sicheng Zhou
B. Li
Ye Tian
Tingguang Li
Lei Han
Yang Janet Liu
37
8
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
37
22
0
14 Oct 2024
MIRAGE: Multimodal Identification and Recognition of Annotations in
  Indian General Prescriptions
MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions
Tavish Mankash
V. S. Chaithanya Kota
Anish De
Praveen Prakash
Kshitij Jadhav
21
0
0
13 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
22
5
0
08 Oct 2024
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou
Ruohan Wang
Boyuan Zheng
Yanan Xie
Cheng Chang
Yiheng Shu
Huan Sun
Yu Su
LM&Ro
LLMAG
76
48
0
07 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
36
32
1
30 Sep 2024
ComiCap: A VLMs pipeline for dense captioning of Comic Panels
ComiCap: A VLMs pipeline for dense captioning of Comic Panels
Emanuele Vivoli
Niccoló Biondi
Marco Bertini
Dimosthenis Karatzas
33
3
0
24 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
74
54
0
19 Sep 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
23
53
0
28 Aug 2024
12
Next