ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown
Title
CaptionQA: Is Your Caption as Useful as the Image Itself?
CaptionQA: Is Your Caption as Useful as the Image Itself?
Shijia Yang
Yunong Liu
Bohan Zhai
Ximeng Sun
Zicheng Liu
E. Barsoum
Manling Li
Chenfeng Xu
CoGe
166
0
0
26 Nov 2025
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
Dapeng Zhang
Fei Shen
Rui Zhao
Yinda Chen
Peng Zhi
Chenyang Li
R. Zhou
Qingguo Zhou
VLM
162
0
0
25 Nov 2025
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based AttackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Xin Liu
Aoyang Zhou
Aoyang Zhou
AAML
88
0
0
02 Nov 2025
Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges
Kemal Oksuz
Alexandru Buburuzan
Anthony Knittel
Yuhan Yao
P. Dokania
16
0
0
31 Oct 2025
Masked Diffusion Captioning for Visual Feature Learning
Masked Diffusion Captioning for Visual Feature Learning
Chao Feng
Zihao Wei
Andrew Owens
DiffM
215
0
0
30 Oct 2025
More than a Moment: Towards Coherent Sequences of Audio Descriptions
More than a Moment: Towards Coherent Sequences of Audio Descriptions
Eshika Khandelwal
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Andrew Zisserman
Gül Varol
Makarand Tapaswi
DiffM
88
0
0
29 Oct 2025
Listening without Looking: Modality Bias in Audio-Visual Captioning
Listening without Looking: Modality Bias in Audio-Visual Captioning
Yuchi Ishikawa
Toranosuke Manabe
Tatsuya Komatsu
Y. Aoki
64
0
0
28 Oct 2025
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
Binbin Li
Guimiao Yang
Zisen Qi
Haiping Wang
Yu Ding
VLM
307
0
0
28 Oct 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
322
0
0
27 Oct 2025
PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
Amith Ananthram
Elias Stengel-Eskin
Lorena A. Bradford
Julia Demarest
Adam Purvis
Keith Krut
Robert Stein
Rina Elster Pantalony
Mohit Bansal
Kathleen McKeown
88
0
0
21 Oct 2025
Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos
Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos
Shingo Yokoi
Kento Sasaki
Yu Yamaguchi
VLMLRM
104
0
0
14 Oct 2025
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
KiHyun Nam
J. Choi
Hyeongkeun Lee
Jungwoo Heo
Joon Son Chung
56
0
0
13 Oct 2025
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
Y. Li
Siyi Qian
Hao Liang
Leqi Zheng
Ruichuan An
Yongzhen Guo
Wentao Zhang
ReLMLRM
100
0
0
10 Oct 2025
Addressing the ID-Matching Challenge in Long Video Captioning
Addressing the ID-Matching Challenge in Long Video Captioning
Zhantao Yang
Huangji Wang
Ruili Feng
Han Zhang
Yuting Hu
Shangwen Zhu
Junyan Li
Yu Liu
Fan Cheng
104
0
0
08 Oct 2025
AURA Score: A Metric For Holistic Audio Question Answering Evaluation
AURA Score: A Metric For Holistic Audio Question Answering Evaluation
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
104
0
0
06 Oct 2025
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Lorenzo Bianchi
Giacomo Pacini
F. Carrara
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
VLM
142
0
0
03 Oct 2025
DescribeEarth: Describe Anything for Remote Sensing Images
DescribeEarth: Describe Anything for Remote Sensing Images
Kaiyu Li
Zixuan Jiang
Xiangyong Cao
Jiayu Wang
Yuchen Xiao
Deyu Meng
Zhi Wang
111
1
0
30 Sep 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
Kazuki Matsuda
Yuiga Wada
Shinnosuke Hirano
Seitaro Otsuki
Komei Sugiura
VLM
120
1
0
30 Sep 2025
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
Zeyu Xie
Chenxing Li
Xuenan Xu
Mengyue Wu
Wenfu Wang
Ruibo Fu
Meng Yu
Dong Yu
Yuexian Zou
120
0
0
29 Sep 2025
Advancing Reference-free Evaluation of Video Captions with Factual Analysis
Advancing Reference-free Evaluation of Video Captions with Factual Analysis
Shubhashis Roy Dipta
Tz-Ying Wu
Subarna Tripathi
120
0
0
20 Sep 2025
Mamba-2 audio captioning: design space exploration and analysis
Mamba-2 audio captioning: design space exploration and analysis
Taehan Lee
Jaehan Jung
Hyukjun Lee
MambaAuLLM
136
0
0
19 Sep 2025
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
Xiaosheng Long
Hanyu Wang
Zhentao Song
Kun Luo
Hongde Liu
96
0
0
19 Sep 2025
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
Huanchen Wang
Wencheng Zhang
Zhiqiang Wang
Zhicong Lu
Yuxin Ma
119
0
0
18 Sep 2025
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
Kentaro Seki
Yuki Okamoto
Kouei Yamaoka
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
93
0
0
18 Sep 2025
ResidualViT for Efficient Temporally Dense Video Encoding
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan
Fabian Caba Heilbron
Bernard Ghanem
Josef Sivic
Bryan C. Russell
153
0
0
16 Sep 2025
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Purushoth
Alireza
AAML
84
0
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
280
3
0
12 Sep 2025
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Eric Slyman
Mehrab Tanjim
Kushal Kafle
Stefan Lee
129
0
0
10 Sep 2025
Aesthetic Image Captioning with Saliency Enhanced MLLMs
Aesthetic Image Captioning with Saliency Enhanced MLLMs
Yilin Tao
Jiashui Huang
Huaze Xu
Ling Shao
237
0
0
04 Sep 2025
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
Xiaofu Chen
Israfel Salazar
Yova Kementchedjhieva
180
1
0
04 Sep 2025
Region-Level Context-Aware Multimodal Understanding
Region-Level Context-Aware Multimodal Understanding
Hongliang Wei
Xianqi Zhang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
VLM
149
0
0
17 Aug 2025
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Wenbin An
Jiahao Nie
Yaqiang Wu
Feng Tian
Shijian Lu
Q. Zheng
MLLM
170
1
0
14 Aug 2025
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Xuezheng Chen
Zhengbo Zou
MLLM
80
0
0
14 Aug 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
84
0
0
10 Aug 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
120
0
0
09 Aug 2025
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
Chenhui Qiang
Zhaoyang Wei
Xumeng Han Zipeng Wang
Zipeng Wang
Siyao Li
Xiangyuan Lan
Jianbin Jiao
Zhenjun Han
LRM
76
2
0
06 Aug 2025
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
Quang-Trung Truong
Yuk-Kwan Wong
Vo Hoang Kim Tuyen Dang
Rinaldi Gotama
D. Nguyen
Sai-Kit Yeung
VOS
282
0
0
06 Aug 2025
Multimodal RAG Enhanced Visual Description
Multimodal RAG Enhanced Visual Description
Amit Kumar Jaiswal
Haiming Liu
Ingo Frommholz
VLM
99
0
0
06 Aug 2025
From Image Captioning to Visual Storytelling
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
194
0
0
31 Jul 2025
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim
MinJu Jeon
Ye-Chan Kim
Soeun Lee
Taewhan Kim
Dong-Jin Kim
161
3
0
24 Jul 2025
VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding
VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene Understanding
Younggun Kim
Ahmed S. Abdelrahman
Mohamed Abdel-Aty
150
3
0
13 Jul 2025
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Yifan Jiang
Yibo Xue
Yukun Kang
Pin Zheng
Jian Peng
Feiran Wu
Changliang Xu
DiffMVGen
212
0
0
05 Jul 2025
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh
J. Mok
Juhyeon Shin
Juhyeon Shin
Sangha Park
J. Mok
Sungroh Yoon
VLM
338
1
0
23 Jun 2025
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following AbilityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
219
2
0
18 Jun 2025
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
Shaoqing Lin
Chong Teng
Fei Li
Donghong Ji
Lizhen Qu
Z. Li
200
0
0
18 Jun 2025
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Qirui Zheng
Xingbo Wang
Keyuan Cheng
Muhammad Asif Ali
Yunlong Lu
Wenxin Li
166
0
0
17 Jun 2025
ABS: Enforcing Constraint Satisfaction On Generated Sequences Via Automata-Guided Beam Search
ABS: Enforcing Constraint Satisfaction On Generated Sequences Via Automata-Guided Beam Search
Vincenzo Collura
Karim Tit
Laura Bussi
Eleonora Giunchiglia
Maxime Cordy
217
1
0
11 Jun 2025
FREE: Fast and Robust Vision Language Models with Early Exits
FREE: Fast and Robust Vision Language Models with Early ExitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Divya J. Bajpai
M. Hanawal
VLM
129
2
0
07 Jun 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
151
2
0
03 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in VideosAAAI Conference on Artificial Intelligence (AAAI), 2025
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
215
2
0
03 Jun 2025
1234...192021
Next