Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1411.5726
Cited By
v1
v2 (latest)
CIDEr: Consensus-based Image Description Evaluation
Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,346 papers shown
Title
RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-Wave Point Cloud Sequence
Zengyuan Lai
Jiarui Yang
Songpengcheng Xia
Lizhou Lin
Lan Sun
Renwen Wang
Qingbin Liu
Qi Wu
Ling Pei
258
1
0
14 Apr 2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang
Xuelong Li
Zilong Huang
Yuchen Ren
Weixian Lei
XueQing Deng
Shihao Chen
Shilin Xu
Jiashi Feng
MLLM
LRM
283
17
0
14 Apr 2025
3D CoCa: Contrastive Learners are 3D Captioners
Ting Huang
Zhenru Zhang
Longji Xu
Hao Tang
248
6
0
13 Apr 2025
Multi-modal and Multi-view Fundus Image Fusion for Retinopathy Diagnosis via Multi-scale Cross-attention and Shifted Window Self-attention
Yonghao Huang
Leiting Chen
Chuan Zhou
155
0
0
12 Apr 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
297
0
0
11 Apr 2025
Towards an Understanding of Context Utilization in Code Intelligence
Yanlin Wang
Kefeng Duan
Dewu Zheng
Ensheng Shi
F. Zhang
...
Xilin Liu
Yuchi Ma
Hongyu Zhang
Qianxiang Wang
Zibin Zheng
200
3
0
11 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
365
0
0
10 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Computer Vision and Pattern Recognition (CVPR), 2025
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
175
0
0
09 Apr 2025
URECA: Unique Region Caption Anything
Sangbeom Lim
J. Kim
Heeji Yoon
Jaewoo Jung
Seungryong Kim
260
1
0
07 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge Belongie
Robert Bamler
Nico Lang
Stella Frank
228
4
0
07 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Mario Sznaier
202
0
0
07 Apr 2025
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
Yimin Wei
Aoran Xiao
Yexian Ren
Yuting Zhu
Hongruixuan Chen
J. Xia
Xiangwei Zhu
VLM
377
6
0
04 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
International Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
299
0
0
03 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Rundong Wang
339
28
0
02 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
282
3
0
02 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Computer Vision and Pattern Recognition (CVPR), 2025
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
192
4
0
02 Apr 2025
PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks
Abdelrahman Elskhawy
Mengze Li
Nassir Navab
Benjamin Busam
VLM
255
2
0
01 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
352
3
0
01 Apr 2025
A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Gonçalo Gomes
Bruno Martins
Chrysoula Zerva
311
1
0
01 Apr 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
International Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), 2025
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
225
1
0
31 Mar 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Computer Vision and Pattern Recognition (CVPR), 2025
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
226
1
0
31 Mar 2025
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
Javen Qinfeng Shi
Anton van den Hengel
Qingming Huang
VGen
191
0
0
31 Mar 2025
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Maofu Liu
Jiahui Liu
Xiaokang Zhang
231
3
0
30 Mar 2025
Empowering Large Language Models with 3D Situation Awareness
Computer Vision and Pattern Recognition (CVPR), 2025
Zhihao Yuan
Yibo Peng
Jinke Ren
Yinghong Liao
Yatong Han
Chun-Mei Feng
Hengshuang Zhao
G. Li
Shuguang Cui
Ge Wang
293
3
0
29 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
376
5
0
29 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yujiao Shi
Yanfeng Wang
VLM
359
3
0
28 Mar 2025
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li
Meng Tian
Zhenyu Lin
Jiangtong Zhu
Dechang Zhu
Haiqiang Liu
Zining Wang
Yueyi Zhang
Zhiwei Xiong
Xinhai Zhao
CoGe
VLM
300
8
0
27 Mar 2025
JEEM: Vision-Language Understanding in Four Arabic Dialects
Karima Kadaoui
Hanin Atwany
Hamdan Al-Ali
Abdelrahman Mohamed
Ali Mekky
Sergei Tilga
Natalia Fedorova
Ekaterina Artemova
Hanan Aldarmaki
Yova Kementchedjhieva
VLM
190
8
0
27 Mar 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Computer Vision and Pattern Recognition (CVPR), 2025
Xiao Guo
Xiufeng Song
Yue Zhang
Xiaohong Liu
Xuyang Liu
312
21
0
26 Mar 2025
ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction
The Web Conference (WWW), 2025
Yiqiao Jin
Stefano Petrangeli
Yu Shen
Gang Wu
LLMAG
LM&Ro
861
1
0
26 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLM
CoGe
427
1
0
25 Mar 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu
Diankun Zhang
Zongchuang Zhao
Jianfeng Cui
Dingkang Liang
Chong Zhang
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
277
46
0
25 Mar 2025
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module
Yishen Liu
Shengda Liu
Zishao Zhong
Hudan Pan
MedIm
324
0
0
24 Mar 2025
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
Diwei Wang
Cédric Bobenrieth
Hyewon Seo
LRM
161
0
0
23 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
218
1
0
22 Mar 2025
MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation
AAAI Conference on Artificial Intelligence (AAAI), 2025
Xiaodan Zhang
Yanzhao Shi
Junzhong Ji
Chengxin Zheng
Liangqiong Qu
166
3
0
22 Mar 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
251
4
0
21 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
319
2
0
21 Mar 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth
Dávid Rozenberszki
Angela Dai
252
3
0
21 Mar 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
249
0
0
21 Mar 2025
AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Boshra Khalili
Andrew W.Smyth
ELM
301
2
0
20 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLM
MQ
315
8
0
20 Mar 2025
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
Junyi Ao
Dekun Chen
Xiaohai Tian
Wenjie Feng
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhizheng Wu
AuLLM
206
1
0
19 Mar 2025
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer
Moritz Böhle
Gabriel de Marmiesse
Laurent Mazaré
Neil Zeghidour
Alexandre Défossez
P. Pérez
AuLLM
VLM
223
0
0
19 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
357
4
0
18 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
188
3
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
283
5
0
18 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Computer Vision and Pattern Recognition (CVPR), 2025
Hao Yin
Guangzong Si
Zilei Wang
205
6
0
17 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
237
45
0
17 Mar 2025
Exploring 3D Reasoning-Driven Planning: From Implicit Human Intentions to Route-Aware Activity Planning
Xueying Jiang
Wenhao Li
Xiaoqin Zhang
Ling Shao
Shijian Lu
LRM
441
2
0
17 Mar 2025
Previous
1
2
3
4
5
6
...
45
46
47
Next