Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1411.5726
Cited By
v1
v2 (latest)
CIDEr: Consensus-based Image Description Evaluation
Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,351 papers shown
Title
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
303
6
0
18 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
208
3
0
18 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Computer Vision and Pattern Recognition (CVPR), 2025
Hao Yin
Guangzong Si
Zilei Wang
213
6
0
17 Mar 2025
Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning
Xueying Jiang
Wenhao Li
Xiaoqin Zhang
Ling Shao
Shijian Lu
LRM
469
2
0
17 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
257
45
0
17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
Hyunjung Shim
258
2
0
17 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
289
18
0
16 Mar 2025
Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing
Feihan Feng
Jingxin Nie
345
0
0
15 Mar 2025
T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation
Seyed Mohammad Hadi Hosseini
Amir Mohammad Izadi
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVM
CoGe
210
1
0
14 Mar 2025
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
International Conference on Learning Representations (ICLR), 2025
R. Teo
T. Nguyen
MoE
336
4
0
14 Mar 2025
OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning
Wenshu Fan
Saihui Hou
Saijie Hou
Jiabao Du
Shibei Meng
Yongzhen Huang
VLM
233
1
0
14 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
391
3
0
13 Mar 2025
Image Quality Assessment: From Human to Machine Preference
Computer Vision and Pattern Recognition (CVPR), 2025
Chunyi Li
Yuan Tian
Xiaoyue Ling
Zicheng Zhang
Haodong Duan
...
Xiaohong Liu
Xiongkuo Min
Guo Lu
Weisi Lin
Guoquan Zheng
154
7
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
418
10
0
13 Mar 2025
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
Computer Vision and Pattern Recognition (CVPR), 2025
Katrin Renz
Long Chen
Elahe Arani
Oleg Sinavski
MLLM
442
38
0
12 Mar 2025
Scaling Laws for Conditional Emergence of Multilingual Image Captioning via Generalization from Translation
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
523
0
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
1.0K
11
0
11 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Haoyang Li
VLM
261
0
0
11 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
267
0
0
11 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
255
16
0
11 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
International Conference on Learning Representations (ICLR), 2025
Qinghao Ye
Xianhan Zeng
Fu Li
Chong Li
Haoqi Fan
CoGe
226
15
0
10 Mar 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
Bo Jiang
Shaoyu Chen
Qian Zhang
Wenyu Liu
Xinggang Wang
OffRL
LRM
VLM
289
39
0
10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
274
0
0
09 Mar 2025
Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform
Chenyu Huang
Peng Ye
Xinyu Wang
Shenghe Zheng
Biqing Qi
Wenlong Zhang
Wanli Ouyang
Tao Chen
139
2
0
09 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai
Songyou Peng
Kyle Genova
Leonidas Guibas
Thomas Funkhouser
3DGS
370
11
0
08 Mar 2025
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
Dingkun Zhang
Shuhan Qi
Xinyu Xiao
Kehai Chen
Xuan Wang
CLL
MoMe
257
0
0
08 Mar 2025
Is Your Video Language Model a Reliable Judge?
International Conference on Learning Representations (ICLR), 2025
M. Liu
Wensheng Zhang
340
7
0
07 Mar 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
224
2
0
06 Mar 2025
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Yanshu Li
384
4
0
05 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Computer Vision and Pattern Recognition (CVPR), 2025
Rui Zhao
Weijia Mao
Mike Zheng Shou
254
4
0
05 Mar 2025
Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations
Khoi Anh Nguyen
Linh Yen Vu
Thang Dinh Duong
Thuan Nguyen Duong
Huy Thanh Nguyen
V. Q. Dinh
192
4
0
05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2025
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
494
42
0
04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Computer Vision and Pattern Recognition (CVPR), 2025
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
374
6
0
03 Mar 2025
Group Relative Policy Optimization for Image Captioning
Xu Liang
165
7
0
03 Mar 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
VGen
297
0
0
03 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
Maria Lymperaiou
Giorgos Filandrianos
Angeliki Dimitriou
Athanasios Voulodimos
Giorgos Stamou
MLLM
174
0
0
01 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaoyi Liu
Huan Zhang
AAML
624
7
0
25 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Zhenghao Liu
Xingsheng Zhu
Tianshuo Zhou
Xinyi Zhang
Xiaoyuan Yi
Shi Yu
Yu Gu
Ge Yu
RALM
VLM
216
6
0
24 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGen
LRM
334
1
0
24 Feb 2025
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Swadhin Das
Saarthak Gupta
and Kamal Kumar
Raksha Sharma
135
2
0
22 Feb 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
183
0
0
20 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Nianzu Yang
Yinglu Li
Zuan Gao
Yun Zheng
Hongtao Xie
VLM
CoGe
430
0
0
19 Feb 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Rui Zhao
Qirui Yuan
Jinyu Li
Haofeng Hu
Yun Li
Chengyuan Zheng
Fei Gao
LRM
225
17
0
19 Feb 2025
Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
1.0K
0
0
18 Feb 2025
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Tiancheng Gu
Kaicheng Yang
Chaoyi Zhang
Yin Xie
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
CLIP
VLM
451
5
0
18 Feb 2025
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
251
0
0
14 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
663
28
0
12 Feb 2025
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
IEEE International Conference on Robotics and Automation (ICRA), 2025
Tianshuo Xu
Hao Lu
Xu Yan
Yingjie Cai
Bingbing Liu
Yingcong Chen
161
15
0
10 Feb 2025
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
Yuxin Lin
Mengshi Qi
Liang Liu
Huadong Ma
CLL
243
4
0
02 Feb 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
380
0
0
28 Jan 2025
Previous
1
2
3
...
5
6
7
...
46
47
48
Next