Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1812.02378
Cited By
v1
v2
v3 (latest)
Auto-Encoding Scene Graphs for Image Captioning
6 December 2018
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Auto-Encoding Scene Graphs for Image Captioning"
50 / 309 papers shown
Title
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation
Jiani Huang
Amish Sethi
Matthew Kuo
Mayank Keoliya
Neelay Velingker
JungHo Jung
Ser-Nam Lim
Ziyang Li
Mayur Naik
LM&Ro
VLM
187
0
0
11 Oct 2025
DescribeEarth: Describe Anything for Remote Sensing Images
Kaiyu Li
Zixuan Jiang
Xiangyong Cao
Jiayu Wang
Yuchen Xiao
Deyu Meng
Zhi Wang
91
1
0
30 Sep 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
60
0
0
10 Aug 2025
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images
Qi Xun Yeo
Yanyan Li
G. Lee
3DPC
3DV
82
0
0
05 Aug 2025
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
142
0
0
31 Jul 2025
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
Monika Shah
Sudarshan Balaji
Somdeb Sarkhel
Sanorita Dey
Deepak Venugopal
87
1
0
28 Jul 2025
FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Hao-Yu Hou
Chun-Yi Lee
Motoharu Sonogashira
Yasutomo Kawanishi
175
0
0
26 Jul 2025
A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Shuzhou Sun
Li Liu
Tianpeng Liu
Shuaifeng Zhi
Ming-Ming Cheng
J. Heikkilä
Yongxiang Liu
CML
369
1
0
29 May 2025
Multimodal Machine Translation with Visual Scene Graph Pruning
Chenyu Lu
Shiliang Sun
Jing Zhao
N. Zhang
Tengfei Song
Hao Yang
386
1
0
26 May 2025
From Data to Modeling: Fully Open-vocabulary Scene Graph Generation
Zuyao Chen
Jinlin Wu
Zhen Lei
Chang Wen Chen
150
0
0
26 May 2025
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
Lakshita Agarwal
Bindu Verma
ViT
288
0
0
23 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
231
2
0
01 Apr 2025
A Causal Adjustment Module for Debiasing Scene Graph Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Li Liu
Shuzhou Sun
Shuaifeng Zhi
Fan Shi
Zhen Liu
J. Heikkilä
Yongxiang Liu
CML
209
17
0
22 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
275
5
0
18 Mar 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
BigData Congress [Services Society] (BSS), 2024
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLM
BDL
VLM
282
1
0
18 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
247
0
0
11 Mar 2025
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu
Xinke Li
Yuning Zhang
Lu Qi
Xin Li
Wenping Wang
Chongshou Li
Xueting Li
Ming-Hsuan Yang
3DV
819
4
0
10 Mar 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
100
0
0
07 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
European Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
225
1
0
03 Jan 2025
Situational Scene Graph for Structured Human-centric Situation Understanding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
925
3
0
30 Oct 2024
A transition towards virtual representations of visual scenes
Américo Pereira
Pedro Carvalho
Luís Côrte-Real
159
0
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
International Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
207
8
0
09 Oct 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
213
7
0
17 Sep 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
182
2
0
28 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
British Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
208
6
0
26 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
European Conference on Computer Vision (ECCV), 2024
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
169
5
0
13 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
European Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
147
11
0
29 Jul 2024
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
European Conference on Computer Vision (ECCV), 2024
Yansheng Li
Tingzhu Wang
Kang Wu
Linlin Wang
Xin Guo
Wenbin Wang
184
3
0
27 Jul 2024
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
Peng Hao
Xiaobing Wang
Yingying Jiang
Hanchao Jia
Xiaoshuai Hao
Shaowei Cui
Junhang Wei
Xiaoshuai Hao
391
4
0
26 Jul 2024
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
Jaehyeong Jeon
Kibum Kim
Kanghoon Yoon
Chanyoung Park
253
6
0
22 Jul 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
231
63
0
27 Jun 2024
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery
Yansheng Li
Linlin Wang
Tingzhu Wang
Xue Yang
Junwei Luo
...
Haifeng Li
Bo Dang
Yongjun Zhang
Yi Yu
Junchi Yan
291
54
0
13 Jun 2024
ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Xian Sun
Qiwei Yan
Chubo Deng
Chenglong Liu
Yi Jiang
...
Wanxuan Lu
Fanglong Yao
Xiaoyu Liu
Lingxiang Hao
Hongfeng Yu
273
2
0
10 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
International Conference on Learning Representations (ICLR), 2024
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
442
55
0
07 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
198
8
0
01 Jun 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
178
17
0
21 May 2024
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Jinbae Im
Jeongyeon Nam
Nokyung Park
Hyungmin Lee
Seunghyun Park
ViT
517
47
0
02 Apr 2024
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li
Songyang Zhang
Dahua Lin
Kai-xiang Chen
Xuming He
VLM
291
38
0
01 Apr 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
204
0
0
26 Mar 2024
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Ce Zhang
Simon Stepputtis
Joseph Campbell
Katia Sycara
Yaqi Xie
222
22
0
18 Mar 2024
An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models
Haochen Luo
Jindong Gu
Fengyuan Liu
Juil Sock
VLM
VPVLM
AAML
218
32
0
14 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
196
16
0
12 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
219
41
0
06 Mar 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black
Jing Shi
Yifei Fai
Tu Bui
John Collomosse
201
8
0
29 Feb 2024
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang
Binghao Huang
Ruihai Wu
Zhuoran Li
Shubham Garg
H. Nayyeri
Shenlong Wang
Yunzhu Li
231
45
0
23 Feb 2024
SGTR+: End-to-end Scene Graph Generation with Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rongjie Li
Songyang Zhang
Xuming He
ViT
179
5
0
23 Jan 2024
Joint Generative Modeling of Grounded Scene Graphs and Images via Diffusion Models
Bicheng Xu
Qi Yan
Renjie Liao
Lele Wang
Leonid Sigal
DiffM
317
3
0
02 Jan 2024
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin
Antonino Furnari
Kyle Min
Subarna Tripathi
G. Farinella
EgoV
169
34
0
06 Dec 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
International Conference on Machine Learning (ICML), 2023
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
246
9
0
29 Nov 2023
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen
Jinlin Wu
Zhen Lei
Zhaoxiang Zhang
Changwen Chen
257
27
0
18 Nov 2023
1
2
3
4
5
6
7
Next