SPICE: Semantic Propositional Image Caption Evaluation

29 July 2016

Papers citing "SPICE: Semantic Propositional Image Caption Evaluation"

50 / 1,002 papers shown

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Huansheng Ning

387

14 Jun 2024

Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

Chun-Yi Kuan

Wei-Ping Huang

Hung-yi Lee

AuLLM

191

12 Jun 2024

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

Jipeng Zhang

208

11 Jun 2024

ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Anurag Ghosh

Shen Zheng

Juan R. Alvarez-Padilla

338

11 Jun 2024

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Zhanyu Ma

237

10 Jun 2024

FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yebin Lee

Imseong Park

Myungjoo Kang

254

10 Jun 2024

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal NarrativeInternational Conference on Learning Representations (ICLR), 2024

437

10 Jun 2024

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Hao Fang

Bin Chen

Hao Wu

445

08 Jun 2024

MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed DescriptionIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024

Cong Yang

Zuchao Li

Lefei Zhang

242

07 Jun 2024

Multi-layer Learnable Attention Mask for Multimodal Tasks

Wayner Barrios

SouYoung Jin

194

04 Jun 2024

Image Captioning via Dynamic Path Customization

Jiayi Ji

Yongjian Wu

280

01 Jun 2024

Artemis: Towards Referential Understanding in Complex Videos

206

01 Jun 2024

Context-aware Difference Distilling for Multi-change Captioning

Yunbin Tu

202

31 May 2024

Faithful Chart Summarization with ChaTS-Pi

Syrine Krichene

Francesco Piccinno

Fangyu Liu

Julian Martin Eisenschlos

295

29 May 2024

Benchmarking and Improving Detail Image Caption

445

29 May 2024

MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model

Jie Li

Fan Yang

Xinbo Gao

250

29 May 2024

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

469

29 May 2024

Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks

349

27 May 2024

Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges

375

24 May 2024

Towards Retrieval-Augmented Architectures for Image Captioning

Lorenzo Baraldi

242

21 May 2024

MICap: A Unified Model for Identity-aware Movie DescriptionsComputer Vision and Pattern Recognition (CVPR), 2024

255

19 May 2024

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

...

388

16 May 2024

CinePile: A Long Video Question Answering Dataset and Benchmark

342

14 May 2024

The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective

197

13 May 2024

Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores

230

02 May 2024

Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

224

26 Apr 2024

Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

160

25 Apr 2024

AutoAD III: The Prequel -- Back to the Pixels

312

22 Apr 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

282

16 Apr 2024

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

...

Huchuan Lu

306

16 Apr 2024

AIGeN: An Adversarial Approach for Instruction Generation in VLN

Lorenzo Baraldi

211

15 Apr 2024

Bridging Vision and Language Spaces with Assignment Prediction

313

15 Apr 2024

UMBRAE: Unified Multimodal Brain DecodingEuropean Conference on Computer Vision (ECCV), 2024

231

10 Apr 2024

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

278

08 Apr 2024

Would Deep Generative Models Amplify Bias in Future Models?Computer Vision and Pattern Recognition (CVPR), 2024

220

04 Apr 2024

ALOHa: A New Measure for Hallucination in Captioning ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

282

03 Apr 2024

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

323

01 Apr 2024

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Dahua Lin

339

01 Apr 2024

Semantic Map-based Generation of Navigation Instructions

218

28 Mar 2024

Text Data-Centric Image Captioning with Interactive Prompts

Fan Wang

212

28 Mar 2024

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

235

27 Mar 2024

Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study

149

26 Mar 2024

Semi-Supervised Image Captioning Considering Wasserstein Graph Matching

Yang Yang

289

26 Mar 2024

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision PeopleInternational Conference on Human Factors in Computing Systems (CHI), 2024

Ricardo E Gonzalez Penuela

Jazmin Collins

Shiri Azenkot

Cynthia L. Bennett

234

22 Mar 2024

Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination

Dingchen Yang

Bowen Cao

Guang Chen

Changjun Jiang

245

21 Mar 2024

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

327

20 Mar 2024

Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

Ivor Tsang

343

19 Mar 2024

A Survey on Quality Metrics for Text-to-Image GenerationIEEE Transactions on Visualization and Computer Graphics (TVCG), 2024

Timo Ropinski

309

18 Mar 2024

TARN-VIST: Topic Aware Reinforcement Network for Visual StorytellingInternational Conference on Language Resources and Evaluation (LREC), 2024

194

18 Mar 2024

Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction

Zhaoyu Chen

177

16 Mar 2024