Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12058
Cited By
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
19 February 2024
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models"
10 / 10 papers shown
Title
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
42
0
0
22 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Y. Li
Quanjun Yin
29
0
0
13 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
26
0
0
03 Apr 2025
Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multi-Agent Control
Timothée Anne
Noah Syrkis
Meriem Elhosni
Florian Turati
Franck Legendre
Alain Jaquier
Sebastian Risi
LLMAG
85
1
0
16 Dec 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
80
3
0
25 Nov 2024
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang
Chufan Shi
Yaxin Liu
Bo Shui
Junjie Wang
...
Yuxiang Zhang
Gongye Liu
Xiaomei Nie
Deng Cai
Yujiu Yang
MLLM
LRM
38
22
0
14 Jun 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
38
6
0
27 May 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
42
120
0
21 Dec 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
113
65
0
13 Mar 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
1