Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.06383
Cited By
How Much Can CLIP Benefit Vision-and-Language Tasks?
13 July 2021
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How Much Can CLIP Benefit Vision-and-Language Tasks?"
9 / 9 papers shown
Title
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Junrong Yue
Y. Zhang
Chuan Qin
Bo Li
Xiaomin Lie
Xinlei Yu
Wenxin Zhang
Zhendong Zhao
24
53
0
23 Apr 2025
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
49
11
0
03 Mar 2023
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
268
2,875
0
11 Feb 2021
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
142
111
0
19 Oct 2020
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
165
1,456
0
16 Mar 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
234
815
0
24 Sep 2019
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen
Hal Daumé
LM&Ro
EgoV
151
142
0
04 Sep 2019
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
219
444
0
07 Jun 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
124
1,403
0
06 Jun 2016
1