ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1602.07332
  4. Cited By
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

23 February 2016
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
Joshua Kravitz
Stephanie Chen
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
ArXivPDFHTML

Papers citing "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations"

50 / 864 papers shown
Title
Open-Set Image Tagging with Multi-Grained Text Supervision
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
25
28
0
23 Oct 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
50
3
0
22 Oct 2023
Semi-supervised multimodal coreference resolution in image narrations
Semi-supervised multimodal coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
35
3
0
20 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network
  for VL Representation
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
22
0
0
20 Oct 2023
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Seoyun Yang
Minjoon Jung
Byoung-Tak Zhang
28
0
0
19 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao
Yikang Shen
Zhenfang Chen
Mingyu Ding
Chuang Gan
48
15
0
10 Oct 2023
Improving Automatic VQA Evaluation Using Large Language Models
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas
Benno Krojer
Aishwarya Agrawal
16
21
0
04 Oct 2023
SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene
  Reconstruction
SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction
Sebastian Koch
Pedro Hermosilla
Narunas Vaskevicius
Mirco Colosi
Timo Ropinski
3DPC
SSL
35
13
0
27 Sep 2023
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile
  Screenshot Captioning
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
Ching-Yu Chiang
I-Hua Chang
Shih-Wei Liao
44
1
0
26 Sep 2023
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on
  Scene Graphs
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs
Guangyao Zhai
Xiaoni Cai
Dianye Huang
Yan Di
Fabian Manhardt
Federico Tombari
Nassir Navab
Benjamin Busam
LM&Ro
24
27
0
21 Sep 2023
Predicate Classification Using Optimal Transport Loss in Scene Graph
  Generation
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
Sorachi Kurita
Satoshi Oyama
Itsuki Noda
OT
24
0
0
19 Sep 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Yunshui Li
Binyuan Hui
Zhaochao Yin
Wanwei He
Run Luo
Yuxing Long
Min Yang
Fei Huang
Yongbin Li
24
1
0
14 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
45
3
0
13 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
21
2
0
06 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
35
22
0
31 Aug 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for
  Multi-Label Image Recognition
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Ruijie Yao
Sheng Jin
Lumin Xu
Wang Zeng
Wentao Liu
Chao Qian
Ping Luo
Ji Wu
23
2
0
28 Aug 2023
Dual Compensation Residual Networks for Class Imbalanced Learning
Dual Compensation Residual Networks for Class Imbalanced Learning
Rui Hou
Hong Chang
Bingpeng Ma
Shiguang Shan
Xilin Chen
20
5
0
25 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
50
9
0
23 Aug 2023
An Examination of the Compositionality of Large Generative
  Vision-Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
29
2
0
21 Aug 2023
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
  Layout-to-Image Generation
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation
Chengyou Jia
Minnan Luo
Zhuohang Dang
Guangwen Dai
Xiaojun Chang
Mengmeng Wang
Jingdong Wang
DiffM
41
13
0
20 Aug 2023
Causal Intersectionality and Dual Form of Gradient Descent for
  Multimodal Analysis: a Case Study on Hateful Memes
Causal Intersectionality and Dual Form of Gradient Descent for Multimodal Analysis: a Case Study on Hateful Memes
Yosuke Miyanishi
M. Nguyen
28
2
0
19 Aug 2023
Diagnosing Human-object Interaction Detectors
Diagnosing Human-object Interaction Detectors
Fangrui Zhu
Yiming Xie
Weidi Xie
Huaizu Jiang
28
7
0
16 Aug 2023
3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs
3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs
Yiding Qiu
Henrik I. Christensen
3DPC
24
3
0
13 Aug 2023
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
Binbin Yang
Yinzheng Luo
Ziliang Chen
Guangrun Wang
Xiaodan Liang
Liang Lin
DiffM
19
12
0
13 Aug 2023
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Lin Li
Guikun Chen
Jun Xiao
Yi Yang
Chunping Wang
Long Chen
28
25
0
13 Aug 2023
Environment-Invariant Curriculum Relation Learning for Fine-Grained
  Scene Graph Generation
Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation
Yu Min
Aming Wu
Cheng Deng
24
6
0
07 Aug 2023
Improving Scene Graph Generation with Superpixel-Based Interaction
  Learning
Improving Scene Graph Generation with Superpixel-Based Interaction Learning
Jingyi Wang
Can Zhang
Jinfa Huang
Bo Ren
Zhidong Deng
23
7
0
04 Aug 2023
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Li Li
Wei Ji
Yiming Wu
Meng Li
Youxuan Qin
Lina Wei
Roger Zimmermann
28
35
0
28 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
27
20
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
32
118
0
25 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
85
224
0
07 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
40
15
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
17
5
0
06 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
D. Dhami
Kristian Kersting
NAI
29
6
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
28
3
0
03 Jul 2023
Confidence-Based Model Selection: When to Take Shortcuts for
  Subpopulation Shifts
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts
Annie S. Chen
Yoonho Lee
Amrith Rajagopal Setlur
Sergey Levine
Chelsea Finn
OOD
16
5
0
19 Jun 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as
  Implicit Reference Chain
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain
Shih-Lun Wu
Yi-Hui Chou
Liang Li
13
0
0
16 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
33
7
0
14 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
23
6
0
12 Jun 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
26
0
0
31 May 2023
Multi-modal Queried Object Detection in the Wild
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
30
30
0
30 May 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
22
4
0
28 May 2023
Choose your Data Wisely: A Framework for Semantic Counterfactuals
Choose your Data Wisely: A Framework for Semantic Counterfactuals
Edmund Dervakos
Konstantinos Thomas
Giorgos Filandrianos
Giorgos Stamou
AAML
18
6
0
28 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
27
2
0
27 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
29
21
0
25 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
25
8
0
24 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and
  Blending
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
J. Liu
Jiashi Feng
VLM
CLIP
18
17
0
22 May 2023
Zero-shot Visual Relation Detection via Composite Visual Cues from Large
  Language Models
Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
Lin Li
Jun Xiao
Guikun Chen
Jian Shao
Yueting Zhuang
Long Chen
VLM
27
25
0
21 May 2023
Advancing Referring Expression Segmentation Beyond Single Image
Advancing Referring Expression Segmentation Beyond Single Image
YiXuan Wu
Zhao Zhang
Xie Chi
Feng Zhu
Rui Zhao
VLM
34
18
0
21 May 2023
Previous
12345...161718
Next