Why Only Text: Empowering Vision-and-Language Navigation with
Multi-modal Prompts

Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts

4 June 2024

Sen Wang

Jiajun Liu

Papers citing "Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts"

6 / 6 papers shown

Title
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 259 4,223 0 30 Jan 2023
MaPLe: Multi-modal Prompt Learning Muhammad Uzair Khattak H. Rasheed Muhammad Maaz Salman Khan F. Khan VPVLM VLM 186 528 0 06 Oct 2022
Iterative Vision-and-Language Navigation Jacob Krantz Shurjo Banerjee Wang Zhu Jason J. Corso Peter Anderson Stefan Lee Jesse Thomason LM&Ro 40 18 0 06 Oct 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations Jialu Li Hao Tan Mohit Bansal LM&Ro 56 12 0 05 Jul 2022
How Much Can CLIP Benefit Vision-and-Language Tasks? Sheng Shen Liunian Harold Li Hao Tan Mohit Bansal Anna Rohrbach Kai-Wei Chang Z. Yao Kurt Keutzer CLIP VLM MLLM 185 403 0 13 Jul 2021
Language and Visual Entity Relationship Graph for Agent Navigation Yicong Hong Cristian Rodriguez-Opazo Yuankai Qi Qi Wu Stephen Gould LM&Ro 171 131 0 19 Oct 2020