ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,000 papers shown
Title
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Ying Yang
Jie Zhang
Xiao Lv
Di Lin
Tao Xiang
Qing Guo
AAMLVLM
107
1
0
30 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
205
4
0
29 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
141
1
0
28 May 2025
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
Muxi Diao
Lele Yang
Hongbo Yin
Zhexu Wang
Yejie Wang
Daxin Tian
Kongming Liang
Zhanyu Ma
VLMLRM
191
3
0
27 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
572
5
0
23 May 2025
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
226
0
0
22 May 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Xinran Wang
Muxi Diao
Yuanzhi Liu
Chunyu Wang
Kongming Liang
Zhanyu Ma
Jun Guo
202
0
0
21 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Steven Chacko
232
3
0
21 May 2025
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
393
1
0
19 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
269
0
0
11 May 2025
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Shuai Wang
Ivona Najdenkoska
Hongyi Zhu
Stevan Rudinac
Monika Kackovic
Nachoem Wijnberg
M. Worring
524
3
0
09 May 2025
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLMVLM
204
3
0
05 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
146
2
0
04 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
404
3
0
22 Apr 2025
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual Prompting
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual PromptingIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025
Wei Zhang
Miaoxin Cai
Yaqian Ning
Tianze Zhang
Yin Zhuang
He Chen
He Chen
Jun Li
Xuerui Mao
275
0
0
17 Apr 2025
Generalized Visual Relation Detection with Diffusion Models
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao
Siqi Chen
Hanwang Zhang
Jun Xiao
Yueting Zhuang
Qianru Sun
249
0
0
16 Apr 2025
FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
303
4
0
16 Apr 2025
Summarizing Speech: A Comprehensive Survey
Summarizing Speech: A Comprehensive Survey
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Jan Niehues
Alexander H. Waibel
408
0
0
10 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
353
0
0
10 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
175
0
0
09 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
291
0
0
03 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
219
2
0
01 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffMVGen
331
3
0
01 Apr 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image CaptioningIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Maofu Liu
Jiahui Liu
Xiaokang Zhang
215
3
0
30 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Make Some Noise: Towards LLM audio reasoning and generation using sound tokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
171
1
0
28 Mar 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face DetectorComputer Vision and Pattern Recognition (CVPR), 2025
Xiao Guo
Xiufeng Song
Yue Zhang
Xiaohong Liu
Xuyang Liu
288
21
0
26 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
203
2
0
26 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLMCoGe
407
1
0
25 Mar 2025
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Boshra Khalili
Andrew W.Smyth
ELM
297
2
0
20 Mar 2025
Universal Scene Graph Generation
Universal Scene Graph GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Shengqiong Wu
Hao Fei
Tat-Seng Chua
323
2
0
19 Mar 2025
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
Xinyan Chen
Jiaxin Ge
Hongming Dai
Qiang Zhou
Qiuxuan Feng
Jingtong Hu
Yun Wang
Jiaming Liu
Shanghang Zhang
LM&Ro
196
2
0
19 Mar 2025
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Junyi Ao
Dekun Chen
Xiaohai Tian
Wenjie Feng
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhizheng Wu
AuLLM
206
1
0
19 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
329
4
0
18 Mar 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov LogicBigData Congress [Services Society] (BSS), 2024
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLMBDLVLM
282
1
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
271
5
0
18 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLMVLM
253
15
0
16 Mar 2025
CLIP-Free, Label-Free, Zero-Shot Concept Bottleneck Models
CLIP-Free, Label-Free, Zero-Shot Concept Bottleneck Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
169
0
0
14 Mar 2025
T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation
Seyed Mohammad Hadi Hosseini
Amir Mohammad Izadi
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVMCoGe
202
1
0
14 Mar 2025
Image Quality Assessment: From Human to Machine PreferenceComputer Vision and Pattern Recognition (CVPR), 2025
Chunyi Li
Yuan Tian
Xiaoyue Ling
Zicheng Zhang
Haodong Duan
...
Xiaohong Liu
Xiongkuo Min
Guo Lu
Weisi Lin
Guoquan Zheng
136
6
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
326
8
0
13 Mar 2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action AlignmentComputer Vision and Pattern Recognition (CVPR), 2025
Katrin Renz
Long Chen
Elahe Arani
Oleg Sinavski
MLLM
414
29
0
12 Mar 2025
ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews
ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews
Xian Gao
Jiacheng Ruan
Zongyun Zhang
Jingsheng Gao
Ting Liu
Yuzhuo Fu
296
7
0
11 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
243
0
0
11 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
231
16
0
11 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment LearningInternational Conference on Learning Representations (ICLR), 2025
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
214
15
0
10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
270
0
0
09 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
341
1
0
03 Mar 2025
Group Relative Policy Optimization for Image Captioning
Xu Liang
157
6
0
03 Mar 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural ArtifactsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sara Ghaboura
Ketan More
Ritesh Thawkar
Wafa Alghallabi
Omkar Thawakar
Fahad Shahbaz Khan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
743
7
0
21 Feb 2025
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
978
0
0
18 Feb 2025
Previous
12345...181920
Next