v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014

20 November 2014

Ramakrishna Vedantam

C. L. Zitnick

Devi Parikh

ArXiv (abs)PDF HTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,353 papers shown

Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024

141

08 Nov 2024

No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

193

06 Nov 2024

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark

Haodong Li

Haicheng Qu

Xiaofeng Zhang

182

05 Nov 2024

From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing

...

291

05 Nov 2024

Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language AttackIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

232

04 Nov 2024

SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities

Ehsan Faghihi

Mohammedreza Zarenejad

Ali-Asghar Beheshti Shirazi

271

04 Nov 2024

TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models

Georgia Gabriela Sampaio

266

02 Nov 2024

Designing a Robust Radiology Report Generation System

Sonit Singh

MedIm

235

02 Nov 2024

MACE: Leveraging Audio for Evaluating Audio Captioning Systems

Satvik Dixit

Soham Deshmukh

Bhiksha Raj

249

01 Nov 2024

Generative Emotion Cause Explanation in Multimodal ConversationsInternational Conference on Multimedia Retrieval (ICMR), 2024

467

01 Nov 2024

Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIPNeural Information Processing Systems (NeurIPS), 2024

262

31 Oct 2024

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Chang Huang

308

29 Oct 2024

Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

278

29 Oct 2024

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding

236

29 Oct 2024

What Factors Affect Multi-Modal In-Context Learning? An In-Depth ExplorationNeural Information Processing Systems (NeurIPS), 2024

L. Qin

Qiguang Chen

Hao Fei

Zhi Chen

Min Li

Wanxiang Che

207

27 Oct 2024

Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable SensorsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024

267

26 Oct 2024

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

445

23 Oct 2024

Image-aware Evaluation of Generated Medical ReportsNeural Information Processing Systems (NeurIPS), 2024

Gefen Dawidowicz

Elad Hirsch

A. Tal

233

22 Oct 2024

EVC-MF: End-to-end Video Captioning Network with Multi-scale Features

189

22 Oct 2024

MotionGlot: A Multi-Embodied Motion Generation ModelIEEE International Conference on Robotics and Automation (ICRA), 2024

Sudarshan Harithas

Srinath Sridhar

396

22 Oct 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

Zhe Chen

...

402

21 Oct 2024

EVA: An Embodied World Model for Future Video Anticipation

...

235

20 Oct 2024

FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning

298

20 Oct 2024

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based SamplingInternational Conference on Learning Representations (ICLR), 2024

Minhyuk Seo

Hyunseo Koh

Jonghyun Choi

381

19 Oct 2024

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions

Shailaja Keyur Sampat

Yezhou Yang

Chitta Baral

LM&Ro

199

17 Oct 2024

EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation

Mithun Manivannan

Vignesh Nethrapalli

Mark Cartwright

161

15 Oct 2024

Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models

Yang Liu

273

15 Oct 2024

When Does Perceptual Alignment Benefit Vision Representations?Neural Information Processing Systems (NeurIPS), 2024

278

14 Oct 2024

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024

Rory Young

Nicolas Pugeault

AAML

360

14 Oct 2024

ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing

312

13 Oct 2024

ECIS-VQG: Generation of Entity-centric Information-seeking Questions from VideosConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

197

13 Oct 2024

BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation

Peijia Qin

Ruiyi Zhang

Pengtao Xie

221

13 Oct 2024

EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

...

Yong Li

249

12 Oct 2024

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ziyang Ma

Kai Yu

275

12 Oct 2024

DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ziyang Ma

311

12 Oct 2024

GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning

Eileen Wang

Caren Han

Josiah Poon

212

12 Oct 2024

Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

227

11 Oct 2024

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024

287

09 Oct 2024

NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People

300

08 Oct 2024

The Mystery of Compositional Generalization in Graph-based Generative Commonsense ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xiyan Fu

Anette Frank

LRM

448

08 Oct 2024

An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

202

08 Oct 2024

TRACE: Temporal Grounding Video LLM via Causal Event ModelingInternational Conference on Learning Representations (ICLR), 2024

Jingyu Liu

Xiaoying Tang

282

08 Oct 2024

R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Chunyi Li

Junxuan Zhang

Zicheng Zhang

H. Wu

Yuan Tian

...

Guo Lu

Xiaohong Liu

Xiongkuo Min

Weisi Lin

Guangtao Zhai

AAML

181

07 Oct 2024

CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024

Devank

Jayateja Kalla

Soma Biswas

178

06 Oct 2024

AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024

Christopher D. Manning

3DV

649

04 Oct 2024

Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks

Hongmei Wang

Luyang Luo

Hao Chen

XAI

366

03 Oct 2024

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among PromptsInternational Conference on Learning Representations (ICLR), 2024

Minh Le

Chau Nguyen

Huy Nguyen

Quyen Tran

Trung Le

Nhat Ho

694

03 Oct 2024

MetaMetrics: Calibrating Metrics For Generation Tasks Using Human PreferencesInternational Conference on Learning Representations (ICLR), 2024

555

03 Oct 2024

Backdooring Vision-Language Models with Out-Of-Distribution DataInternational Conference on Learning Representations (ICLR), 2024

Chao Chen

373

02 Oct 2024

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus DatasetComputer Vision and Pattern Recognition (CVPR), 2024

Xiao Wang

Yuehang Li

Chuanfu Li

Jin Tang

339

01 Oct 2024