ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14281
  4. Cited By
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

23 May 2023
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
    SSL
ArXivPDFHTML

Papers citing "Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining"

15 / 15 papers shown
Title
Beyond the Doors of Perception: Vision Transformers Represent Relations
  Between Objects
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden Lake
Ellie Pavlick
34
2
0
22 Jun 2024
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Sami Alabed
Daniel Belov
Bart Chrzaszcz
Juliana Franco
Dominik Grewe
...
Michael Schaarschmidt
Timur Sitdikov
Agnieszka Swietlik
Dimitrios Vytiniotis
Joel Wee
21
3
0
20 Jan 2024
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language
  Models
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
14
2
0
18 Aug 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
14
688
0
26 Jun 2023
An Examination of the Robustness of Reference-Free Image Captioning
  Evaluation Metrics
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
Aishwarya Agrawal
17
6
0
24 May 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
41
44
0
06 Mar 2023
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun
  Dependencies?
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
11
13
0
21 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
92
167
0
28 Sep 2021
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question
  Answering
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner
Hang Li
Marcel Hildebrandt
Deepan Das
Volker Tresp
Stephan Günnemann
36
31
0
13 Jul 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
146
230
0
05 Dec 2018
Image Generation from Scene Graphs
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
211
809
0
04 Apr 2018
1