Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14281
Cited By
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
23 May 2023
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining"
15 / 15 papers shown
Title
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden Lake
Ellie Pavlick
34
2
0
22 Jun 2024
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Sami Alabed
Daniel Belov
Bart Chrzaszcz
Juliana Franco
Dominik Grewe
...
Michael Schaarschmidt
Timur Sitdikov
Agnieszka Swietlik
Dimitrios Vytiniotis
Joel Wee
21
3
0
20 Jan 2024
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
14
2
0
18 Aug 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
14
688
0
26 Jun 2023
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
41
44
0
06 Mar 2023
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
11
13
0
21 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
92
167
0
28 Sep 2021
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner
Hang Li
Marcel Hildebrandt
Deepan Das
Volker Tresp
Stephan Günnemann
36
31
0
13 Jul 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
146
230
0
05 Dec 2018
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
211
809
0
04 Apr 2018
1