TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

4 June 2024

Papers citing "TopViewRS: Vision-Language Models as Top-View Spatial Reasoners"

8 / 8 papers shown

Title
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation Linqing Zhong Chen Gao Zihan Ding Yue Liao Si Liu Shifeng Zhang Xu Zhou Si Liu LRM 80 3 0 25 Nov 2024
What Makes Multimodal In-Context Learning Work? Folco Bertini Baldassini Mustafa Shukor Matthieu Cord Laure Soulier Benjamin Piwowarski 32 18 0 24 Apr 2024
When Do We Not Need Larger Vision Models? Baifeng Shi Ziyang Wu Maolin Mao Xin Wang Trevor Darrell VLM LRM 44 23 0 19 Mar 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Xiao-wen Dong Pan Zhang Yuhang Zang Yuhang Cao Bin Wang ... Conghui He Xingcheng Zhang Yu Qiao Dahua Lin Jiaqi Wang VLM MLLM 73 89 0 29 Jan 2024
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training Meera Hahn James M. Rehg LM&Ro 26 4 0 10 Oct 2022
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe Hongyang Li Chonghao Sima Jifeng Dai Wenhai Wang Lewei Lu ... Xiaosong Jia Siqian Liu Jianping Shi Dahua Lin Yu Qiao 85 91 0 12 Sep 2022
Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information? Julia Rozanova Deborah Ferreira K. Dubba Weiwei Cheng Dell Zhang André Freitas LM&Ro 22 5 0 17 Sep 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020