v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014

20 November 2014

Ramakrishna Vedantam

C. L. Zitnick

Devi Parikh

ArXiv (abs)PDF HTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,351 papers shown

Title
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024 Sara Sarto Marcella Cornia Rita Cucchiara 303 6 0 18 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding Ayesha Ishaq Jean Lahoud Fahad Shahbaz Khan Salman Khan Hisham Cholakkal Rao Muhammad Anwer 208 3 0 18 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster InferenceComputer Vision and Pattern Recognition (CVPR), 2025 Hao Yin Guangzong Si Zilei Wang 213 6 0 17 Mar 2025
Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning Xueying Jiang Wenhao Li Xiaoqin Zhang Ling Shao Shijian Lu LRM 469 2 0 17 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card Amazon AGI Aaron Langford A. Shah Abhanshu Gupta Abhimanyu Bhatter ... Benjamin Biggs Benjamin Ott Bhanu Vinzamuri Bharath Venkatesh Bhavana Ganesh 257 45 0 17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram DescriptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Wan Ju Kang Eunki Kim Na Min An Sangryul Kim Haemin Choi Ki Hoon Kwak Hyunjung Shim 258 2 0 17 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM EraAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Kanzhi Cheng Wenpo Song Jiaxin Fan Zheng Ma Qiushi Sun Fangzhi Xu Chenyang Yan Nuo Chen Jianbing Zhang Jiajun Chen MLLM VLM 289 18 0 16 Mar 2025
Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing Feihan Feng Jingxin Nie 345 0 0 15 Mar 2025
T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Seyed Mohammad Hadi Hosseini Amir Mohammad Izadi Ali Abdollahi Armin Saghafian M. Baghshah EGVM CoGe 210 1 0 14 Mar 2025
MoLEx: Mixture of Layer Experts for Finetuning with Sparse UpcyclingInternational Conference on Learning Representations (ICLR), 2025 R. Teo T. Nguyen MoE 336 4 0 14 Mar 2025
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning Wenshu Fan Saihui Hou Saijie Hou Jiabao Du Shibei Meng Yongzhen Huang VLM 233 1 0 14 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation Evangelos Kazakos Cordelia Schmid Josef Sivic 391 3 0 13 Mar 2025
Image Quality Assessment: From Human to Machine PreferenceComputer Vision and Pattern Recognition (CVPR), 2025 Chunyi Li Yuan Tian Xiaoyue Ling Zicheng Zhang Haodong Duan ... Xiaohong Liu Xiongkuo Min Guo Lu Weisi Lin Guoquan Zheng 154 7 0 13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens Ju He Qihang Yu Qihao Liu Liang-Chieh Chen 418 10 0 13 Mar 2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action AlignmentComputer Vision and Pattern Recognition (CVPR), 2025 Katrin Renz Long Chen Elahe Arani Oleg Sinavski MLLM 442 38 0 12 Mar 2025
Scaling Laws for Conditional Emergence of Multilingual Image Captioning via Generalization from Translation Julian Spravil Sebastian Houben Sven Behnke VLM 523 0 0 12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 Shehreen Azad Vibhav Vineet Yogesh S Rawat VLM 1.0K 11 0 11 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models Bozhi Luan Wengang Zhou Hao Feng Zhe Wang Xiaosong Li Haoyang Li VLM 261 0 0 11 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning Henry Senior Luca Rossi Gregory Slabaugh Shanxin Yuan VLM 267 0 0 11 Mar 2025
Mellow: a small audio language model for reasoning Soham Deshmukh Satvik Dixit Rita Singh Bhiksha Raj AuLLM ReLM LRM 255 16 0 11 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment LearningInternational Conference on Learning Representations (ICLR), 2025 Qinghao Ye Xianhan Zeng Fu Li Chong Li Haoqi Fan CoGe 226 15 0 10 Mar 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning Bo Jiang Shaoyu Chen Qian Zhang Wenyu Liu Xinggang Wang OffRL LRM VLM 289 39 0 10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing Yang Xiao Wang Lu Jie Ji Ruimeng Ye Gen Li Xiaolong Ma Bo Hui OT 274 0 0 09 Mar 2025
Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform Chenyu Huang Peng Ye Xinyu Wang Shenghe Zheng Biqing Qi Wenlong Zhang Wanli Ouyang Tao Chen 139 2 0 09 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting Anh Thai Songyou Peng Kyle Genova Leonidas Guibas Thomas Funkhouser 3DGS 370 11 0 08 Mar 2025
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs Dingkun Zhang Shuhan Qi Xinyu Xiao Kehai Chen Xuan Wang CLL MoMe 257 0 0 08 Mar 2025
Is Your Video Language Model a Reliable Judge?International Conference on Learning Representations (ICLR), 2025 M. Liu Wensheng Zhang 340 7 0 07 Mar 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning Qing Zhou Tao Yang Junyu Gao W. Ni Junzheng Wu Qi Wang 224 2 0 06 Mar 2025
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations Yanshu Li 384 4 0 05 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal CyclesComputer Vision and Pattern Recognition (CVPR), 2025 Rui Zhao Weijia Mao Mike Zheng Shou 254 4 0 05 Mar 2025
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations Khoi Anh Nguyen Linh Yen Vu Thang Dinh Duong Thuan Nguyen Duong Huy Thanh Nguyen V. Q. Dinh 192 4 0 05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2025 Saeed Ranjbar Alvar Gursimran Singh Mohammad Akbari Yong Zhang VLM 494 42 0 04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal SynchronizationComputer Vision and Pattern Recognition (CVPR), 2025 Zitang Zhou Ke Mei Yu Lu Tianyi Wang Fengyun Rao 374 6 0 03 Mar 2025
Group Relative Policy Optimization for Image Captioning Xu Liang 165 7 0 03 Mar 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living Ramanathan Rajendiran Debaditya Roy Basura Fernando VGen 297 0 0 03 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning Maria Lymperaiou Giorgos Filandrianos Angeliki Dimitriou Athanasios Voulodimos Giorgos Stamou MLLM 174 0 0 01 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025 Zhaoyi Liu Huan Zhang AAML 624 7 0 25 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Zhenghao Liu Xingsheng Zhu Tianshuo Zhou Xinyi Zhang Xiaoyuan Yi Shi Yu Yu Gu Ge Yu RALM VLM 216 6 0 24 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark Davide Testa Giovanni Bonetta Raffaella Bernardi Alessandro Bondielli Alessandro Lenci Alessio Miaschi Lucia Passaro Bernardo Magnini VGen LRM 334 1 0 24 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Swadhin Das Saarthak Gupta and Kamal Kumar Raksha Sharma 135 2 0 22 Feb 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Caihua Liu Xu Li Wenjing Xue Wei Tang Xia Feng 183 0 0 20 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness Zhihang Liu Chen-Wei Xie Bin Wen Feiwu Yu Jixuan Chen ... Nianzu Yang Yinglu Li Zuan Gao Yun Zheng Hongtao Xie VLM CoGe 430 0 0 19 Feb 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive LearningIEEE Robotics and Automation Letters (IEEE RA-L), 2025 Rui Zhao Qirui Yuan Jinyu Li Haofeng Hu Yun Li Chengyuan Zheng Fei Gao LRM 225 17 0 19 Feb 2025
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions Aditya K Surikuchi Raquel Fernández Sandro Pezzelle EGVM 1.0K 0 0 18 Feb 2025
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Tiancheng Gu Kaicheng Yang Chaoyi Zhang Yin Xie Xiang An Ziyong Feng Dongnan Liu Weidong Cai Jiankang Deng CLIP VLM 451 5 0 18 Feb 2025
Image Embedding Sampling Method for Diverse Captioning Sania Waheed Na Min An 251 0 0 14 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Mohammad Mahdi Abootorabi Amirhosein Zobeiri Mahdi Dehghani Mohammadali Mohammadkhani Bardia Mohammadi Omid Ghahroodi M. Baghshah Ehsaneddin Asgari RALM 663 28 0 12 Feb 2025
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language ModelsIEEE International Conference on Robotics and Automation (ICRA), 2025 Tianshuo Xu Hao Lu Xu Yan Yingjie Cai Bingbing Liu Yingcong Chen 161 15 0 10 Feb 2025
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving Yuxin Lin Mengshi Qi Liang Liu Huadong Ma CLL 243 4 0 02 Feb 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric EnhancementIEEE Robotics and Automation Letters (IEEE RA-L), 2025 Kei Katsumata Motonari Kambara Daichi Yashima Ryosuke Korekata Komei Sugiura 380 0 0 28 Jan 2025

All Papers

CIDEr: Consensus-based Image Description Evaluation

Papers citing "CIDEr: Consensus-based Image Description Evaluation"