Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,260 papers shown
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
475
15
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
402
39
0
18 Apr 2024
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
Wei Chen
Zhiyuan Li
LLMAG
128
9
0
17 Apr 2024
From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search
Jintao Sun
Zhedong Zheng
Gangyi Ding
Gangyi Ding
434
19
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
254
20
0
15 Apr 2024
Conditional Prototype Rectification Prompt Learning
Haoxing Chen
Yaohui Li
Zizheng Huang
Yan Hong
Zhuoer Xu
Zhangxuan Gu
Jun Lan
Huijia Zhu
Weiqiang Wang
VLM
231
3
0
15 Apr 2024
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao
Renjie Pi
Jianhua Han
Xiaodan Liang
Hang Xu
Wei Zhang
Zhenguo Li
Dan Xu
VLM
ObjD
295
44
0
14 Apr 2024
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
Jiannan Ge
Lingxi Xie
Hongtao Xie
Nianzu Yang
Xiaopeng Zhang
Yongdong Zhang
Qi Tian
VLM
292
5
0
08 Apr 2024
Contextual Chart Generation for Cyber Deception
David D. Nguyen
David Liebowitz
Surya Nepal
S. Kanhere
Sharif Abuadbba
250
1
0
07 Apr 2024
Vision Transformers in Domain Adaptation and Generalization: A Study of Robustness
Shadi Alijani
Jamil Fayyad
Homayoun Najjaran
OOD
313
1
0
05 Apr 2024
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Haozhe Luo
Ziyu Zhou
Corentin Royer
Anjany Sekuboyina
Bjoern Menze
VLM
ViT
MedIm
258
11
0
04 Apr 2024
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Deyao Zhu
Jian Ding
Mohamed Elhoseiny
VLM
234
116
0
04 Apr 2024
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
Rui Wang
Chuanfu Shen
M. Marín-Jiménez
George Q. Huang
Shiqi Yu
CVBM
232
9
0
04 Apr 2024
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes
International Workshop on Semantic Evaluation (SemEval), 2024
Amirhossein Abaskohi
AmirHossein Dabiri Aghdam
Lele Wang
Giuseppe Carenini
226
1
0
03 Apr 2024
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Mamadou Keita
W. Hamidouche
Hessen Bougueffa Eutamene
Abdenour Hadid
Abdelmalik Taleb-Ahmed
306
22
0
02 Apr 2024
CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference
Ruqi Liao
Chuqing Zhao
Jin Li
Weiqi Feng
67
0
0
02 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
239
3
0
01 Apr 2024
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
236
8
0
01 Apr 2024
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha
Ankit Jha
Shirsha Bose
Ashwin Nair
Moloud Abdar
Biplab Banerjee
VLM
206
23
0
31 Mar 2024
Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations
Jaisidh Singh
Ishaan Shrivastava
Mayank Vatsa
Richa Singh
Aparna Bharati
VLM
CoGe
221
31
0
29 Mar 2024
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
Shuang Li
Jiahua Wang
Lijie Wen
LRM
151
0
0
29 Mar 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
211
5
0
28 Mar 2024
Scaling Vision-and-Language Navigation With Offline RL
Valay Bundele
Mahesh Bhupati
Biplab Banerjee
Aditya Grover
OffRL
183
1
0
27 Mar 2024
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yuxuan Wang
Xiaoyuan Liu
VLM
275
0
0
24 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Yiyi Zhou
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
MoE
184
2
0
22 Mar 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
350
28
0
22 Mar 2024
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
192
2
0
20 Mar 2024
As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?
Anjun Hu
Jindong Gu
Francesco Pinto
Konstantinos Kamnitsas
Juil Sock
AAML
SILM
251
9
0
19 Mar 2024
Modality-Agnostic fMRI Decoding of Vision and Language
Mitja Nikolaus
Milad Mozafari
Nicholas Asher
Leila Reddy
Rufin VanRullen
189
4
0
18 Mar 2024
Prioritized Semantic Learning for Zero-shot Instance Navigation
European Conference on Computer Vision (ECCV), 2024
Xander Sun
Louis Lau
Hoyard Zhi
Ronghe Qiu
Junwei Liang
250
20
0
18 Mar 2024
Deciphering Hate: Identifying Hateful Memes and Their Targets
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
202
8
0
16 Mar 2024
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Computer Vision and Pattern Recognition (CVPR), 2024
Enguang Wang
Zhimao Peng
Zhengyuan Xie
Fei Yang
Xialei Liu
Ming-Ming Cheng
471
14
0
15 Mar 2024
PosSAM: Panoptic Open-vocabulary Segment Anything
VS Vibashan
Shubhankar Borse
Hyojin Park
Debasmit Das
Vishal M. Patel
Munawar Hayat
Fatih Porikli
VLM
MLLM
193
8
0
14 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
516
244
0
14 Mar 2024
Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
Bo Shu
Zhouyao Zhu
Dong Shu
371
3
0
14 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Fan Yang
Jinqiao Wang
Jinqiao Wang
ObjD
294
26
0
14 Mar 2024
Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Long Lan
Fengxiang Wang
Shuyan Li
Xiangtao Zheng
Zengmao Wang
Xinwang Liu
VLM
202
14
0
13 Mar 2024
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
International Conference on Language Resources and Evaluation (LREC), 2024
Oana Ignat
Longju Bai
Joan Nwatu
Amélie Reymond
201
8
0
12 Mar 2024
Noise-powered Multi-modal Knowledge Graph Representation Framework
International Conference on Computational Linguistics (COLING), 2024
Zhuo Chen
Yin Fang
Yichi Zhang
Lingbing Guo
Jiaoyan Chen
Hua-zeng Chen
Wen Zhang
Wen Zhang
194
0
0
11 Mar 2024
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi
Zehong Yan
Wynne Hsu
Yang Deng
MLLM
294
86
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
373
166
0
04 Mar 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen
Zhuokai Zhao
Hongyin Luo
Huaxiu Yao
Bo Li
Jiawei Zhou
MLLM
264
132
0
01 Mar 2024
Acquiring Linguistic Knowledge from Multimodal Input
Theodor Amariucai
Alexander Scott Warstadt
CLL
284
4
0
27 Feb 2024
Vision Transformers with Natural Language Semantics
Young-Kyung Kim
Matías Di Martino
Guillermo Sapiro
ViT
153
7
0
27 Feb 2024
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
322
9
0
27 Feb 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Zihang Jiang
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
299
36
0
27 Feb 2024
ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking
Yushan Han
Kaer Huang
163
1
0
27 Feb 2024
How Can LLM Guide RL? A Value-Based Approach
Shenao Zhang
Sirui Zheng
Shuqi Ke
Zhihan Liu
Wanxin Jin
Jianbo Yuan
Yingxiang Yang
Hongxia Yang
Zhaoran Wang
246
12
0
25 Feb 2024
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
JIazhao Zhang
Kunyu Wang
Rongtao Xu
Gengze Zhou
Yicong Hong
Xiaomeng Fang
Qi Wu
Dongbin Zhao
Wang He
LM&Ro
657
153
0
24 Feb 2024
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
Zijun Long
Xuri Ge
R. McCreadie
Joemon M. Jose
331
12
0
23 Feb 2024
Previous
1
2
3
...
5
6
7
...
24
25
26
Next