Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1607.08822
Cited By
SPICE: Semantic Propositional Image Caption Evaluation
29 July 2016
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SPICE: Semantic Propositional Image Caption Evaluation"
50 / 1,002 papers shown
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
387
15
0
14 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
191
19
0
12 Jun 2024
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Renjie Pi
Jianshu Zhang
Jipeng Zhang
Boyao Wang
Zhekai Chen
Tong Zhang
3DV
208
33
0
11 Jun 2024
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh
Shen Zheng
Shen Zheng
Juan R. Alvarez-Padilla
Hailiang Zhu
Hailiang Zhu
Michael Cardei
Nicholas Dunn
Christoph Mertz
Srinivasa Narasimhan
338
7
0
11 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
237
8
0
10 Jun 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yebin Lee
Imseong Park
Myungjoo Kang
254
34
0
10 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
International Conference on Learning Representations (ICLR), 2024
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
437
8
0
10 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
445
29
0
08 Jun 2024
MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Cong Yang
Zuchao Li
Lefei Zhang
242
3
0
07 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
194
2
0
04 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
280
10
0
01 Jun 2024
Artemis: Towards Referential Understanding in Complex Videos
Jihao Qiu
Yuan Zhang
Xi Tang
Lingxi Xie
Tianren Ma
Pengyu Yan
David Doermann
Qixiang Ye
Yunjie Tian
VLM
VGen
206
21
0
01 Jun 2024
Context-aware Difference Distilling for Multi-change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Zheng-Jun Zha
Chenggang Yan
Qin Huang
202
13
0
31 May 2024
Faithful Chart Summarization with ChaTS-Pi
Syrine Krichene
Francesco Piccinno
Fangyu Liu
Julian Martin Eisenschlos
295
2
0
29 May 2024
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLM
MLLM
445
60
0
29 May 2024
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
Ziqi Ren
Jie Li
Xuetong Xue
Xin Li
Fan Yang
Zhicheng Jiao
Xinbo Gao
250
4
0
29 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
469
9
0
29 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
349
0
0
27 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
375
18
0
24 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
242
19
0
21 May 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Computer Vision and Pattern Recognition (CVPR), 2024
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
255
7
0
19 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Brandon Smart
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
388
31
0
16 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
342
96
0
14 May 2024
The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective
Andrew Shin
Yusuke Mori
Kunitake Kaneko
VGen
EGVM
197
3
0
13 May 2024
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
Kiyoon Jeong
Woojun Lee
Woongchan Nam
Minjeong Ma
Pilsung Kang
230
4
0
02 May 2024
Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models
Yuhang Huang
Zihan Wu
Chongyang Gao
Jiawei Peng
Xu Yang
224
2
0
26 Apr 2024
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning
Tianhui Zhang
Bei Peng
Danushka Bollegala
LRM
160
16
0
25 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
312
33
0
22 Apr 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
282
16
0
16 Apr 2024
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Kai Chen
Yanze Li
Wenhua Zhang
Yanxin Liu
Pengxiang Li
...
Xinhai Zhao
Zhenguo Li
Dit-Yan Yeung
Huchuan Lu
Xu Jia
ELM
MLLM
306
62
0
16 Apr 2024
AIGeN: An Adversarial Approach for Instruction Generation in VLN
Niyati Rawal
Roberto Bigazzi
Lorenzo Baraldi
Rita Cucchiara
GAN
211
5
0
15 Apr 2024
Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
VLM
313
12
0
15 Apr 2024
UMBRAE: Unified Multimodal Brain Decoding
European Conference on Computer Vision (ECCV), 2024
Weihao Xia
Raoul de Charette
Cengiz Öztireli
Jing-Hao Xue
231
29
0
10 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Goran Frehse
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
278
8
0
08 Apr 2024
Would Deep Generative Models Amplify Bias in Future Models?
Computer Vision and Pattern Recognition (CVPR), 2024
Tianwei Chen
Yusuke Hirota
Mayu Otani
Noa Garcia
Yuta Nakashima
220
24
0
04 Apr 2024
ALOHa: A New Measure for Hallucination in Captioning Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Suzanne Petryk
David M. Chan
Anish Kachinthaya
Haodi Zou
John F. Canny
Joseph E. Gonzalez
Trevor Darrell
HILM
282
29
0
03 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
323
12
0
01 Apr 2024
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li
Songyang Zhang
Dahua Lin
Kai-xiang Chen
Xuming He
VLM
339
45
0
01 Apr 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
218
5
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
212
3
0
28 Mar 2024
ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds
Gijs Wijngaard
Elia Formisano
Bruno L. Giordano
M. Dumontier
235
5
0
27 Mar 2024
Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study
Atsushi Teramoto
Ayano Michiba
Yuka Kiriyama
Tetsuya Tsukamoto
K. Imaizumi
H. Fujita
MedIm
149
1
0
26 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
289
0
0
26 Mar 2024
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
International Conference on Human Factors in Computing Systems (CHI), 2024
Ricardo E Gonzalez Penuela
Jazmin Collins
Shiri Azenkot
Cynthia L. Bennett
234
49
0
22 Mar 2024
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Dingchen Yang
Bowen Cao
Guang Chen
Changjun Jiang
245
13
0
21 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
327
16
0
20 Mar 2024
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Yang Liu
Xuhong Ren
Ivor Tsang
Qing Guo
AAML
343
32
0
19 Mar 2024
A Survey on Quality Metrics for Text-to-Image Generation
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Sebastian Hartwig
Dominik Engel
Leon Sick
H. Kniesel
Tristan Payer
Poonam Poonam
Michael Glockler
Alex Bauerle
Timo Ropinski
EGVM
309
0
0
18 Mar 2024
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling
International Conference on Language Resources and Evaluation (LREC), 2024
Weiran Chen
Xin Li
Jiaqi Su
Guiqian Zhu
Ying Li
Yi Ji
Chunping Liu
194
1
0
18 Mar 2024
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction
Jiyuan Fu
Zhaoyu Chen
Kaixun Jiang
Haijing Guo
Jiafeng Wang
Shuyong Gao
Wenqiang Zhang
VLM
AAML
177
7
0
16 Mar 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next
Page 5 of 21
Page
of 21
Go