Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 1,002 papers shown
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models
Yuiga Wada
Kanta Kaneda
Komei Sugiura
236
6
0
07 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
199
10
0
07 Nov 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang
Xiaosong Jia
Guoying Gu
Junchi Yan
ELM
604
171
0
02 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
371
88
0
31 Oct 2023
Video-Helpful Multimodal Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yihang Li
Shuichiro Shimizu
Chenhui Chu
Sadao Kurohashi
Wei Li
175
2
0
31 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
British Machine Vision Conference (BMVC), 2023
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
203
6
0
30 Oct 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
251
10
0
28 Oct 2023
An Early Evaluation of GPT-4V(ision)
Yang Wu
Shilong Wang
Hao Yang
Tian Zheng
Hongbo Zhang
Yanyan Zhao
Bing Qin
MLLM
ELM
193
48
0
25 Oct 2023
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
Xiang Chen
Xiaojun Wan
184
2
0
25 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
276
17
0
24 Oct 2023
CLAIR: Evaluating Image Captions with Large Language Models
David M. Chan
Suzanne Petryk
Joseph E. Gonzalez
Trevor Darrell
John F. Canny
198
36
0
19 Oct 2023
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
Junaid Ali
Matthäus Kleindessner
F. Wenzel
Kailash Budhathoki
Volkan Cevher
Chris Russell
VLM
248
15
0
18 Oct 2023
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
Zheng Ma
Changxin Wang
Bo Huang
Zi-Yue Zhu
Jianbing Zhang
187
3
0
15 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
369
268
0
01 Oct 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
195
39
0
28 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
IEEE Games Entertainment Media Conference (IEEE GEM), 2023
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
233
0
0
27 Sep 2023
MindGPT: Interpreting What You See with Non-invasive Brain Recordings
IEEE Transactions on Image Processing (IEEE TIP), 2023
Jiaxuan Chen
Yu Qi
Yueming Wang
Gang Pan
267
12
0
27 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
235
12
0
21 Sep 2023
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
International Conference on Learning Representations (ICLR), 2023
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
246
5
0
21 Sep 2023
Toward Unified Controllable Text Generation via Regular Expression Instruction
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Xin Zheng
Hongyu Lin
Xianpei Han
Le Sun
223
7
0
19 Sep 2023
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
Sorachi Kurita
Satoshi Oyama
Itsuki Noda
OT
170
0
0
19 Sep 2023
Synth-AC: Enhancing Audio Captioning with Synthetic Supervision
Feiyang Xiao
Qiaoxi Zhu
Jian Guan
Xubo Liu
Haohe Liu
Kejia Zhang
Wenwu Wang
177
2
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
152
3
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
203
8
0
15 Sep 2023
Learning to Predict Concept Ordering for Common Sense Generation
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Tianhui Zhang
Danushka Bollegala
Bei Peng
LRM
127
3
0
12 Sep 2023
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning
International Conference on Language Resources and Evaluation (LREC), 2023
Guisheng Liu
Yi Li
Zhengcong Fei
Haiyan Fu
Xiangyang Luo
Yanqing Guo
VLM
DiffM
262
16
0
10 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
204
11
0
05 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
299
8
0
05 Sep 2023
DeViL: Decoding Vision features into Language
Meghal Dani
Isabel Rio-Torto
Stephan Alaniz
Zeynep Akata
VLM
196
11
0
04 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
288
21
0
01 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Joshua Forster Feinglass
Yezhou Yang
177
2
0
01 Sep 2023
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?
Etienne Labbé
Thomas Pellegrini
J. Pinquier
169
5
0
29 Aug 2023
Explaining Vision and Language through Graphs of Events in Space and Time
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
VLM
188
4
0
29 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
227
20
0
25 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
186
30
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
ACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
229
24
0
23 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
221
10
0
23 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
IEEE International Conference on Computer Vision (ICCV), 2023
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
199
3
0
21 Aug 2023
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Fawaz Sammani
Nikos Deligiannis
187
6
0
17 Aug 2023
Informative Scene Graph Generation via Debiasing
International Journal of Computer Vision (IJCV), 2023
Lianli Gao
Xinyu Lyu
Yuyu Guo
Yuxuan Hu
Yuanyou Li
Lu Xu
Hengtao Shen
Jingkuan Song
199
5
0
10 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
International Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
270
118
0
03 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
169
65
0
31 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
ACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
190
2
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
IEEE International Conference on Computer Vision (ICCV), 2023
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
273
121
0
26 Jul 2023
Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation
Haitian Zeng
Xiaohan Wang
Wenguan Wang
Yi Yang
268
10
0
25 Jul 2023
Improving Multimodal Datasets with Image Captioning
Neural Information Processing Systems (NeurIPS), 2023
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
263
125
0
19 Jul 2023
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
R. Liu
Kailai Li
Kunyu Peng
Junwei Zheng
Ke Cao
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
142
22
0
15 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
486
2
0
10 Jul 2023
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
267
89
0
30 Jun 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Natural Language Processing and Chinese Computing (NLPCC), 2023
Haoqin Tu
Bowen Yang
Xianfeng Zhao
173
7
0
29 Jun 2023
Previous
1
2
3
...
6
7
8
...
19
20
21
Next
Page 7 of 21
Page
of 21
Go