ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.01655
  4. Cited By
Words aren't enough, their order matters: On the Robustness of Grounding
  Visual Referring Expressions

Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

4 May 2020
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
    ObjD
ArXivPDFHTML

Papers citing "Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions"

12 / 12 papers shown
Title
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
44
3
0
23 Sep 2024
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
11
38
0
21 Jun 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
19
1
0
30 May 2023
ULN: Towards Underspecified Vision-and-Language Navigation
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-jui Fu
Yujie Lu
William Yang Wang
31
4
0
18 Oct 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
156
0
30 Apr 2022
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene
  Graphs with Language Structures via Dependency Relationships
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
Chao Lou
Wenjuan Han
Yuh-Chen Lin
Zilong Zheng
CoGe
16
10
0
27 Mar 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
  Grounding
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
21
3
0
24 Jan 2022
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Arjun Reddy Akula
Song-Chun Zhu
29
3
0
17 Jan 2022
A Primer on Contrastive Pretraining in Language Processing: Methods,
  Lessons Learned and Perspectives
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives
Nils Rethmeier
Isabelle Augenstein
SSL
VLM
79
90
0
25 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
An Empirical Study on Robustness to Spurious Correlations using
  Pre-trained Language Models
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Lifu Tu
Garima Lalwani
Spandana Gella
He He
LRM
19
183
0
14 Jul 2020
1