Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02208
Cited By
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
4 June 2024
Haodong Hong
Sen Wang
Zi Huang
Qi Wu
Jiajun Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts"
6 / 6 papers shown
Title
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
259
4,223
0
30 Jan 2023
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
186
528
0
06 Oct 2022
Iterative Vision-and-Language Navigation
Jacob Krantz
Shurjo Banerjee
Wang Zhu
Jason J. Corso
Peter Anderson
Stefan Lee
Jesse Thomason
LM&Ro
40
18
0
06 Oct 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Mohit Bansal
LM&Ro
56
12
0
05 Jul 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
171
131
0
19 Oct 2020
1