Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02537
Cited By
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
4 June 2024
Chengzu Li
Caiqi Zhang
Han Zhou
Nigel Collier
Anna Korhonen
Ivan Vulić
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TopViewRS: Vision-Language Models as Top-View Spatial Reasoners"
8 / 8 papers shown
Title
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
80
3
0
25 Nov 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
32
18
0
24 Apr 2024
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
44
23
0
19 Mar 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
73
89
0
29 Jan 2024
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
26
4
0
10 Oct 2022
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
Hongyang Li
Chonghao Sima
Jifeng Dai
Wenhai Wang
Lewei Lu
...
Xiaosong Jia
Siqian Liu
Jianping Shi
Dahua Lin
Yu Qiao
85
91
0
12 Sep 2022
Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?
Julia Rozanova
Deborah Ferreira
K. Dubba
Weiwei Cheng
Dell Zhang
André Freitas
LM&Ro
22
5
0
17 Sep 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1