ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXiv (abs)PDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown
GiT: Towards Generalist Vision Transformer through Universal Language
  Interface
GiT: Towards Generalist Vision Transformer through Universal Language InterfaceEuropean Conference on Computer Vision (ECCV), 2024
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Jiaming Song
Bernt Schiele
Liwei Wang
VLM
279
22
0
14 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
222
17
0
12 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
304
46
0
06 Mar 2024
Neural Image Compression with Text-guided Encoding for both Pixel-level
  and Perceptual Fidelity
Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity
Hagyeong Lee
Minkyu Kim
Jun-Hyuk Kim
Seungeon Kim
Dokwan Oh
Jaeho Lee
DiffM
231
17
0
05 Mar 2024
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
Chen Xu
Tian Lan
Changlong Yu
Wei Wang
Jun Gao
...
Qunxi Dong
Kun Qian
Piji Li
Wei Bi
Bin Hu
392
2
0
04 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
212
47
0
28 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
222
6
0
28 Feb 2024
EDTC: enhance depth of text comprehension in automated audio captioning
EDTC: enhance depth of text comprehension in automated audio captioning
Liwen Tan
Yin Cao
Yi Zhou
207
0
0
27 Feb 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLMVLM
376
74
0
26 Feb 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
156
11
0
25 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
331
8
0
25 Feb 2024
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP
  Guided Reinforcement Learning
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
Antoine Chaffin
Ewa Kijak
Vincent Claveau
260
3
0
21 Feb 2024
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
Wanqing Cui
Keping Bi
Jiafeng Guo
Xueqi Cheng
SyDaReLMRALMLRM
337
14
0
21 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
282
5
0
19 Feb 2024
Cobra Effect in Reference-Free Image Captioning Metrics
Cobra Effect in Reference-Free Image Captioning Metrics
Zheng Ma
Changxin Wang
Yawen Ouyang
Fei Zhao
Jianbing Zhang
Shujian Huang
Jiajun Chen
242
4
0
18 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
232
21
0
15 Feb 2024
A Systematic Review of Data-to-Text NLG
A Systematic Review of Data-to-Text NLG
Chinonso Osuji
Thiago Castro Ferreira
Brian Davis
328
4
0
13 Feb 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and
  Instruction Tuning
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction TuningInterspeech (Interspeech), 2024
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
287
5
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting
  Classification datasets and their semantic hierarchy
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchyInternational Conference on Learning Representations (ICLR), 2024
Simon Ging
M. A. Bravo
Thomas Brox
VLM
401
19
0
11 Feb 2024
CIC: A Framework for Culturally-Aware Image Captioning
CIC: A Framework for Culturally-Aware Image Captioning
Youngsik Yun
Jihie Kim
VLM
413
10
0
08 Feb 2024
Multimodal Rationales for Explainable Visual Question Answering
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
504
2
0
06 Feb 2024
SymbolicAI: A framework for logic-based approaches combining generative
  models and solvers
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Marius-Constantin Dinu
Claudiu Leoveanu-Condrei
Markus Holzleitner
Werner Zellinger
Sepp Hochreiter
319
18
0
01 Feb 2024
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual
  Storytelling
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
278
5
0
01 Feb 2024
Common Sense Reasoning for Deepfake Detection
Common Sense Reasoning for Deepfake Detection
Yue Zhang
Ben Colman
Xiao Guo
Ali Shahriyari
Gaurav Bharaj
481
59
0
31 Jan 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIPVLM
203
42
0
31 Jan 2024
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth
  observation data
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Chenhui Zhang
Sherrie Wang
283
36
0
31 Jan 2024
A Survey on Data Augmentation in Large Model Era
A Survey on Data Augmentation in Large Model Era
Yue Zhou
Chenlu Guo
Xu Wang
Yi-Ju Chang
Yuan Wu
LM&MAVLM
485
49
0
27 Jan 2024
Zero Shot Open-ended Video Inference
Zero Shot Open-ended Video Inference
Ee Yeo Keat
Zhang Hao
Alexander Matyasko
Basura Fernando
VLM
146
0
0
23 Jan 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal
  Data
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal DataInternational Conference on Learning Representations (ICLR), 2024
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
198
17
0
16 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via
  Text-Only Training
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
212
12
0
04 Jan 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
717
167
0
29 Dec 2023
Towards Consistent Language Models Using Declarative Constraints
Towards Consistent Language Models Using Declarative Constraints
Jasmin Mousavi
Arash Termehchy
HILMALM
203
2
0
24 Dec 2023
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image
  Captioning
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIPVLM
254
20
0
14 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
ToViLaG: Your Visual-Language Generative Model is Also An EvildoerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
251
25
0
13 Dec 2023
OT-Attack: Enhancing Adversarial Transferability of Vision-Language
  Models via Optimal Transport Optimization
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Yang Liu
Yang Bai
Jindong Gu
Yang Liu
Simeng Qin
VLM
278
32
0
07 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Ding Wang
Yong-Jin Liu
Xiaoling Wang
Yu Qiao
413
36
0
07 Dec 2023
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLMVLM
395
14
0
06 Dec 2023
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Mismatch Quest: Visual and Textual Feedback for Image-Text MisalignmentEuropean Conference on Computer Vision (ECCV), 2023
Brian Gordon
Yonatan Bitton
Yonatan Shafir
Roopal Garg
Xi Chen
Dani Lischinski
Daniel Cohen-Or
Idan Szpektor
240
17
0
05 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
  Captioning
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image CaptioningIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Cong Yang
Zuchao Li
Lefei Zhang
163
61
0
02 Dec 2023
Segment and Caption Anything
Segment and Caption AnythingComputer Vision and Pattern Recognition (CVPR), 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLMVLM
244
33
0
01 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context LearningComputer Vision and Pattern Recognition (CVPR), 2023
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
250
49
0
29 Nov 2023
StyleCap: Automatic Speaking-Style Captioning from Speech Based on
  Speech and Language Self-supervised Learning Models
StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kazuki Yamauchi
Yusuke Ijima
Yuki Saito
178
12
0
28 Nov 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion
  Mechanism
DECap: Towards Generalized Explicit Caption Editing via Diffusion MechanismEuropean Conference on Computer Vision (ECCV), 2023
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
237
4
0
25 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language
  Explanation
From Wrong To Right: A Recursive Approach Towards Vision-Language ExplanationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
252
4
0
21 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
  Large Language Models
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLMLRMELM
422
11
0
20 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Kepeng Xu
Jun Liu
MU
653
0
0
16 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
202
15
0
14 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured ConceptsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
221
11
0
14 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
320
595
0
14 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
210
4
0
08 Nov 2023
Previous
123...567...192021
Next