ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
EGVM
425
7
0
02 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
189
2
0
01 Jun 2025
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Ying Yang
Jie Zhang
Xiao Lv
Di Lin
Tao Xiang
Qing Guo
AAMLVLM
167
0
0
30 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
285
5
0
29 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
178
1
0
28 May 2025
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving
Muxi Diao
Lele Yang
Hongbo Yin
Zhexu Wang
Yejie Wang
Daxin Tian
Kongming Liang
Zhanyu Ma
VLMLRM
345
3
0
27 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
814
5
0
23 May 2025
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
298
0
0
22 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Steven Chacko
289
5
0
21 May 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Xinran Wang
Muxi Diao
Yuanzhi Liu
Chunyu Wang
Kongming Liang
Zhanyu Ma
Jun Guo
301
1
0
21 May 2025
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
480
1
0
19 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
463
0
0
11 May 2025
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Shuai Wang
Ivona Najdenkoska
Hongyi Zhu
Stevan Rudinac
Monika Kackovic
Nachoem Wijnberg
M. Worring
618
3
0
09 May 2025
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLMVLM
287
4
0
05 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Yisi Zhan
Lei Hou
Juanzi Li
Bin Xu
190
2
0
04 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
506
3
0
22 Apr 2025
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual Prompting
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual PromptingIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025
Wei Zhang
Miaoxin Cai
Yaqian Ning
Tianze Zhang
Yin Zhuang
He Chen
He Chen
Jun Li
Xuerui Mao
409
0
0
17 Apr 2025
FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
376
4
0
16 Apr 2025
Generalized Visual Relation Detection with Diffusion Models
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao
Siqi Chen
Hanwang Zhang
Jun Xiao
Yueting Zhuang
Qianru Sun
286
0
0
16 Apr 2025
Summarizing Speech: A Comprehensive Survey
Summarizing Speech: A Comprehensive Survey
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Jan Niehues
Alexander H. Waibel
467
0
0
10 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
423
0
0
10 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
209
0
0
09 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
377
2
0
03 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
345
2
0
01 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffMVGen
417
3
0
01 Apr 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image CaptioningIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Maofu Liu
Jiahui Liu
Xiaokang Zhang
287
5
0
30 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Make Some Noise: Towards LLM audio reasoning and generation using sound tokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
214
1
0
28 Mar 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face DetectorComputer Vision and Pattern Recognition (CVPR), 2025
Xiao Guo
Xiufeng Song
Yue Zhang
Xiaohong Liu
Xuyang Liu
405
24
0
26 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
251
2
0
26 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLMCoGe
503
1
0
25 Mar 2025
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Boshra Khalili
Andrew W.Smyth
ELM
372
2
0
20 Mar 2025
Universal Scene Graph Generation
Universal Scene Graph GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Shengqiong Wu
Hao Fei
Tat-Seng Chua
403
3
0
19 Mar 2025
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
Xinyan Chen
Jiaxin Ge
Hongming Dai
Qiang Zhou
Qiuxuan Feng
Jingtong Hu
Yun Wang
Jiaming Liu
Shanghang Zhang
LM&Ro
276
2
0
19 Mar 2025
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Junyi Ao
Dekun Chen
Xiaohai Tian
Wenjie Feng
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhizheng Wu
AuLLM
253
1
0
19 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
441
5
0
18 Mar 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov LogicBigData Congress [Services Society] (BSS), 2024
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLMBDLVLM
327
1
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
368
9
0
18 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLMVLM
321
19
0
16 Mar 2025
CLIP-Free, Label-Free, Zero-Shot Concept Bottleneck Models
CLIP-Free, Label-Free, Zero-Shot Concept Bottleneck Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
229
0
0
14 Mar 2025
T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation
Seyed Mohammad Hadi Hosseini
Amir Mohammad Izadi
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVMCoGe
236
1
0
14 Mar 2025
Image Quality Assessment: From Human to Machine PreferenceComputer Vision and Pattern Recognition (CVPR), 2025
Chunyi Li
Yuan Tian
Xiaoyue Ling
Zicheng Zhang
Haodong Duan
...
Xiaohong Liu
Xiongkuo Min
Guo Lu
Weisi Lin
Guoquan Zheng
192
7
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
538
13
0
13 Mar 2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action AlignmentComputer Vision and Pattern Recognition (CVPR), 2025
Katrin Renz
Long Chen
Elahe Arani
Oleg Sinavski
MLLM
494
46
0
12 Mar 2025
ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews
ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews
Xian Gao
Jiacheng Ruan
Zongyun Zhang
Jingsheng Gao
Ting Liu
Yuzhuo Fu
376
8
0
11 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
289
0
0
11 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
296
17
0
11 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment LearningInternational Conference on Learning Representations (ICLR), 2025
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
271
15
0
10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
321
0
0
09 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
427
2
0
03 Mar 2025
Group Relative Policy Optimization for Image Captioning
Xu Liang
177
7
0
03 Mar 2025
Previous
12345...192021
Next
Page 2 of 21
Pageof 21