ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.08822
  4. Cited By
SPICE: Semantic Propositional Image Caption Evaluation

SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
    EGVM
ArXivPDFHTML

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 247 papers shown
Title
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding
Shuai Wang
Ivona Najdenkoska
Hongyi Zhu
S. Rudinac
Monika Kackovic
N. Wijnberg
M. Worring
104
0
0
09 May 2025
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum
Wen-Han Hsieh
Tsung-Han Wu
Ritwik Gupta
Trevor Darrell
David M. Chan
MLLM
VLM
54
0
0
05 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
49
0
0
22 Apr 2025
FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
53
0
0
16 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
42
0
0
10 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
38
0
0
03 Apr 2025
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic
Monika Shah
Somdeb Sarkhel
Deepak Venugopal
MLLM
BDL
VLM
83
0
0
18 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
66
0
0
13 Mar 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
159
0
0
18 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
98
4
0
12 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
56
0
0
30 Jan 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
61
0
0
28 Jan 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
45
3
0
28 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
L. Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
84
162
0
17 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
Multi-Agent Planning Using Visual Language Models
Multi-Agent Planning Using Visual Language Models
Michele Brienza
F. Argenziano
Vincenzo Suriani
D. Bloisi
Daniele Nardi
LM&Ro
LLMAG
62
4
0
31 Dec 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
70
5
0
31 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
103
4
0
12 Dec 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
104
1
0
25 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
25
0
0
09 Nov 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
57
3
0
14 Oct 2024
A Unified Debiasing Approach for Vision-Language Models across
  Modalities and Tasks
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
Hoin Jung
T. Jang
Xiaoqian Wang
VLM
23
2
0
10 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image
  Captioner using Audiovisual Distribution Alignment
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
32
0
0
08 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
77
25
0
04 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
37
7
0
03 Oct 2024
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past
  Semantic Information
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information
Runze Xia
Congchi Yin
Piji Li
26
0
0
30 Sep 2024
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao
Wenqi Lyu
Hui Wang
Zixu Wang
Zerui Li
Yuan Zhang
Mingkui Tan
Qi Wu
LRM
36
2
0
27 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
92
1
0
04 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
40
7
0
31 Jul 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
35
5
0
16 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
49
7
0
08 Jul 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
38
4
0
14 Jun 2024
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive
  Through Work Zones
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh
R. Tamburo
Shen Zheng
Juan R. Alvarez-Padilla
Hailiang Zhu
Michael Cardei
Nicholas Dunn
Christoph Mertz
Srinivasa G. Narasimhan
39
1
0
11 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
47
4
0
10 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
38
13
0
08 Jun 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
57
5
0
29 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
38
39
0
14 May 2024
Improving Diversity of Commonsense Generation by Large Language Models
  via In-Context Learning
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning
Tianhui Zhang
Bei Peng
Danushka Bollegala
LRM
20
7
0
25 Apr 2024
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
Ricardo E Gonzalez Penuela
Jazmin Collins
Shiri Azenkot
Cynthia L. Bennett
29
25
0
22 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
25
14
0
06 Mar 2024
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation
Chen Xu
Tian Lan
Changlong Yu
Wei Wang
Jun Gao
...
Qunxi Dong
Kun Qian
Piji Li
Wei Bi
Bin Hu
39
0
0
04 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
30
3
0
19 Feb 2024
ProtChatGPT: Towards Understanding Proteins with Large Language Models
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
12
0
15 Feb 2024
CIC: A Framework for Culturally-Aware Image Captioning
CIC: A Framework for Culturally-Aware Image Captioning
Youngsik Yun
Jihie Kim
VLM
22
5
0
08 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
Mitigating Open-Vocabulary Caption Hallucinations
Mitigating Open-Vocabulary Caption Hallucinations
Assaf Ben-Kish
Moran Yanuka
Morris Alper
Raja Giryes
Hadar Averbuch-Elor
MLLM
VLM
16
6
0
06 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
16
28
0
29 Nov 2023
12345
Next