ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural ArtifactsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sara Ghaboura
Ketan More
Ritesh Thawkar
Wafa Alghallabi
Omkar Thawakar
Fahad Shahbaz Khan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
847
8
0
21 Feb 2025
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
1.1K
0
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
397
10
0
18 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
709
29
0
12 Feb 2025
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Manh Luong
Khai Nguyen
Dinh Q. Phung
Gholamreza Haffari
Zhuang Li
OT
275
0
0
08 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven ActivitiesAAAI Conference on Artificial Intelligence (AAAI), 2025
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
279
0
0
30 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric EnhancementIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
420
0
0
28 Jan 2025
An Ensemble Model with Attention Based Mechanism for Image CaptioningComputers & electrical engineering (Comput. Electr. Eng.), 2025
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
367
11
0
28 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
802
348
0
17 Jan 2025
Classifier-Guided Captioning Across ModalitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
219
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
277
2
0
03 Jan 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLMComputer Vision and Pattern Recognition (CVPR), 2024
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
412
39
0
31 Dec 2024
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
165
4
0
31 Dec 2024
Multi-Agent Planning Using Visual Language Models
Multi-Agent Planning Using Visual Language ModelsEuropean Conference on Artificial Intelligence (ECAI), 2024
Michele Brienza
F. Argenziano
Vincenzo Suriani
D. Bloisi
Daniele Nardi
LM&RoLLMAG
260
6
0
31 Dec 2024
From Hallucinations to Facts: Enhancing Language Models with Curated
  Knowledge Graphs
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs
Ratnesh Kumar Joshi
Sagnik Sengupta
Asif Ekbal
HILMKELM
228
2
0
24 Dec 2024
SCBench: A Sports Commentary Benchmark for Video LLMs
SCBench: A Sports Commentary Benchmark for Video LLMs
Kuangzhi Ge
Lawrence Yunliang Chen
Kevin Zhang
Yulin Luo
Tianyu Shi
Liaoyuan Fan
Xiang Li
Guanqun Wang
Shanghang Zhang
230
3
0
23 Dec 2024
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
D. Gupta
Dina Demner-Fushman
LM&MA
238
3
0
15 Dec 2024
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision
  Transformers
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision TransformersNeural Information Processing Systems (NeurIPS), 2024
Dong Hoon Lee
Seunghoon Hong
230
10
0
13 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
442
15
0
12 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu Liu
Jiaya Jia
AuLLM
284
18
0
12 Dec 2024
CEGI: Measuring the trade-off between efficiency and carbon emissions
  for SLMs and VLMs
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
Abhas Kumar
Kapil Pathak
Rajesh Kavuru
Prabhakar Srinivasan
256
0
0
03 Dec 2024
DIR: Retrieval-Augmented Image Captioning with Comprehensive
  Understanding
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
302
1
0
02 Dec 2024
Detailed Object Description with Controllable Dimensions
Detailed Object Description with Controllable DimensionsIEEE transactions on multimedia (IEEE TMM), 2024
Xinran Wang
Hao Zhang
Baoteng Li
Kongming Liang
Hao Sun
Zhongjiang He
Tianhao Shen
Jun Guo
346
1
0
28 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
395
7
0
25 Nov 2024
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
Hao Liang
Zirong Chen
Feiyu Xiong
Wentao Zhang
309
0
0
11 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
391
2
0
09 Nov 2024
Analyzing The Language of Visual Tokens
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
105
9
0
07 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language AttackIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yang Liu
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
232
8
0
04 Nov 2024
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
Georgia Gabriela Sampaio
Ruixiang Zhang
Shuangfei Zhai
Jiatao Gu
J. Susskind
Navdeep Jaitly
Yizhe Zhang
DiffMCLIP
266
1
0
02 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
249
4
0
01 Nov 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
278
5
0
29 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable SensorsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
267
14
0
26 Oct 2024
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal
  Misinformation Veracity Prediction
SceneGraMMi: Scene Graph-boosted Hybrid-fusion for Multi-Modal Misinformation Veracity Prediction
Swarang Joshi
Siddharth Mavani
Joel Alex
Arnav Negi
Rahul Mishra
Ponnurangam Kumaraguru
178
0
0
20 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
EVA: An Embodied World Model for Future Video Anticipation
Yatian Wang
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Wenhan Luo
VGen
229
17
0
20 Oct 2024
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-ImageNeural Information Processing Systems (NeurIPS), 2024
Yu Zhao
Hao Fei
Xiangtai Li
L. Qin
Jiayi Ji
Erik Cambria
Meishan Zhang
Hao Fei
Jianguo Wei
DiffM
263
2
0
20 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data
  Generation
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
160
2
0
15 Oct 2024
SGEdit: Bridging LLM with Text2Image Generative Model for Scene
  Graph-based Image Editing
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image EditingACM Transactions on Graphics (TOG), 2024
Zhiyuan Zhang
Dongdong Chen
J. Liao
DiffM
315
7
0
15 Oct 2024
Efficient and Effective Universal Adversarial Attack against
  Vision-Language Pre-training Models
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Jian Shu
AAMLVLM
272
2
0
15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
359
20
0
14 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and
  CLAP-Refine through LLMs
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
269
11
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
304
14
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
226
5
0
11 Oct 2024
A Unified Debiasing Approach for Vision-Language Models across
  Modalities and Tasks
A Unified Debiasing Approach for Vision-Language Models across Modalities and TasksNeural Information Processing Systems (NeurIPS), 2024
Hoin Jung
T. Jang
Xiaoqian Wang
VLM
199
16
0
10 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
287
11
0
09 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired
  People
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
298
3
0
08 Oct 2024
The Mystery of Compositional Generalization in Graph-based Generative
  Commonsense Reasoning
The Mystery of Compositional Generalization in Graph-based Generative Commonsense ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiyan Fu
Anette Frank
LRM
443
1
0
08 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image
  Captioner using Audiovisual Distribution Alignment
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
201
0
0
08 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world
  Corruptions?
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
175
14
0
07 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
178
5
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
649
92
0
04 Oct 2024
Previous
123456...192021
Next