ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,351 papers shown
Title
GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models
GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models
Lianming Huang
Haibo Hu
Qiao Li
Xin He
Nan Guan
Chun Jason Xue
VLM
105
0
0
20 Aug 2025
Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference
Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference
Yunxiang Yang
Ningning Xu
Jidong J. Yang
96
0
0
19 Aug 2025
Region-Level Context-Aware Multimodal Understanding
Region-Level Context-Aware Multimodal Understanding
Hongliang Wei
Xianqi Zhang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
VLM
149
0
0
17 Aug 2025
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Xuezheng Chen
Zhengbo Zou
MLLM
80
0
0
14 Aug 2025
GoViG: Goal-Conditioned Visual Navigation Instruction Generation
GoViG: Goal-Conditioned Visual Navigation Instruction Generation
Fengyi Wu
Yifei Dong
Zhi-Qi Cheng
Yilong Dai
Guangyu Chen
Hang Wang
Jingdong Sun
Alexander G. Hauptmann
104
2
0
13 Aug 2025
AMRG: Extend Vision Language Models for Automatic Mammography Report Generation
AMRG: Extend Vision Language Models for Automatic Mammography Report Generation
Nak-Jun Sung
Donghyun Lee
Bo Hwa Choi
Chae Jung Park
VLM
108
0
0
12 Aug 2025
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
RORPCap: Retrieval-based Objects and Relations Prompt for Image Captioning
Jinjing Gu
Tianbao Qin
Yuanyuan Pu
Zhengpeng Zhao
VLM
80
0
0
10 Aug 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang
Zihan Jia
Zhilin Dai
Sheng Guo
Limin Wang
CLIPVLM
164
0
0
10 Aug 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
116
0
0
09 Aug 2025
Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense Clustering
Towards Robust Evaluation of Visual Activity Recognition: Resolving Verb Ambiguity with Sense Clustering
Louie Hong Yao
Nicholas Jarvis
Tianyu Jiang
68
0
0
07 Aug 2025
A Survey on Video Temporal Grounding with Multimodal Large Language Model
A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yue Yu
Wei Liu
Y. Liu
Meng-yang Liu
Liqiang Nie
Zhouchen Lin
C. Chen
AI4TSVLMLRM
145
6
0
07 Aug 2025
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning
Quang-Trung Truong
Yuk-Kwan Wong
Vo Hoang Kim Tuyen Dang
Rinaldi Gotama
D. Nguyen
Sai-Kit Yeung
VOS
270
0
0
06 Aug 2025
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
Chenhui Qiang
Zhaoyang Wei
Xumeng Han Zipeng Wang
Zipeng Wang
Siyao Li
Xiangyuan Lan
Jianbin Jiao
Zhenjun Han
LRM
64
2
0
06 Aug 2025
Multimodal RAG Enhanced Visual Description
Multimodal RAG Enhanced Visual Description
Amit Kumar Jaiswal
Haiming Liu
Ingo Frommholz
VLM
99
0
0
06 Aug 2025
R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation
R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation
Futian Wang
Yuhan Qiao
Xiao Wang
Fuling Wang
Yuxiang Zhang
Dengdi Sun
MedIm
97
1
0
05 Aug 2025
Bench2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving
Bench2ADVLM: A Closed-Loop Benchmark for Vision-language Models in Autonomous Driving
Tianyuan Zhang
Ting Jin
L. Wang
Jiangfan Liu
Yaning Tan
Mingchuan Zhang
Aishan Liu
Xianglong Liu
144
2
0
04 Aug 2025
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang
Yingchen Yu
Yunqing Zhao
Shijian Lu
Song Bai
102
2
0
03 Aug 2025
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
163
0
0
03 Aug 2025
SGCap: Decoding Semantic Group for Zero-shot Video Captioning
SGCap: Decoding Semantic Group for Zero-shot Video Captioning
Zeyu Pan
Ping Li
Wenxiao Wang
VLM
94
0
0
02 Aug 2025
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models
Mingyu Fu
Wei Suo
Ji Ma
Lin Yuanbo Wu
Peng Wang
Yanning Zhang
VLM
142
1
0
02 Aug 2025
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang
I-Chao Shen
Yufeng Wang
Yi-Hsuan Tsai
Y. Yang
Shuchang Zhou
Wenrui Ding
Takeo Igarashi
M. Yang
AI4CE
204
2
0
02 Aug 2025
From Image Captioning to Visual Storytelling
From Image Captioning to Visual Storytelling
Admitos Passadakis
Yingjin Song
Albert Gatt
DiffM
190
0
0
31 Jul 2025
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Ting Huang
Zeyu Zhang
Hao Tang
LRM
94
9
0
31 Jul 2025
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
Yadong Niu
Tianzi Wang
Heinrich Dinkel
Xingwei Sun
Jiahao Zhou
Gang Li
Jizhong Liu
Xunying Liu
Junbo Zhang
Jian Luan
AuLLM
174
2
0
31 Jul 2025
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
Pei Deng
Wenqian Zhou
Hanlin Wu
92
0
0
30 Jul 2025
CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning
CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning
George Ibrahim
R. Ramos
Yova Kementchedjhieva
VLM
102
1
0
27 Jul 2025
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi
Haoying Sun
Yaofei Wu
Junchi Yan
Haoran Zhang
Lifang Wu
Liang Wang
Changwen Chen
119
2
0
27 Jul 2025
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
Abdelrahman Mohamed
Yova Kementchedjhieva
163
0
0
26 Jul 2025
Object-centric Video Question Answering with Visual Grounding and Referring
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang
Qirui Chen
Cilin Yan
Jiayin Cai
Xiaolong Jiang
Yao Hu
Weidi Xie
Stratis Gavves
MLLMVOS
212
4
0
25 Jul 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Yusuke Hirota
Boyi Li
Ryo Hachiuma
Yueh-Hua Wu
Boris Ivanovic
Yuta Nakashima
Marco Pavone
Yejin Choi
Yu-Chun Wang
Chao-Han Huck Yang
VLM
179
1
0
25 Jul 2025
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim
MinJu Jeon
Ye-Chan Kim
Soeun Lee
Taewhan Kim
Dong-Jin Kim
161
3
0
24 Jul 2025
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Feng Hong
Geng Yu
Yushi Ye
Haicheng Huang
Huangjie Zheng
Ya Zhang
Yanfeng Wang
Jiangchao Yao
138
12
0
24 Jul 2025
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning
Tianheng Qiu
Jingchun Gao
Jingyu Li
Huiyi Leong
Xuan Huang
Xi Wang
Xiaocheng Zhang
K. Xu
Lan Zhang
125
8
0
24 Jul 2025
When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning Models
When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning ModelsInternational Journal of Computer Applications (IJCA), 2025
Hitesh Kumar Gupta
VLM
186
0
0
24 Jul 2025
Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models
Xiaoyan Wang
Zeju Li
Yifan Xu
Jiaxing Qi
Zhifei Yang
Ruifei Ma
Xiangde Liu
Chao Zhang
116
3
0
22 Jul 2025
Automatic Fine-grained Segmentation-assisted Report Generation
Automatic Fine-grained Segmentation-assisted Report Generation
F. Jonske
C. Seibold
Osman Alperen Koras
F. Bahnsen
Marie Bauer
Amin Dada
Hamza Kalisch
Anton Schily
Jens Kleesiek
148
0
0
22 Jul 2025
Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models
Tz-Ying Wu
Tahani Trigui
S. N. Sridhar
Anand Bodas
Subarna Tripathi
86
0
0
22 Jul 2025
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
InterAct-Video: Reasoning-Rich Video QA for Urban Traffic
Joseph Raj Vishal
Rutuja Patil
Manas Srinivas Gowda
Katha Naik
Yezhou Yang
Bharatesh Chakravarthi
Bharatesh Chakravarthi
126
0
0
19 Jul 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
152
3
0
17 Jul 2025
Making Language Model a Hierarchical Classifier
Making Language Model a Hierarchical Classifier
Yihong Wang
Zhonglin Jiang
Ningyuan Xi
Yue Zhao
Qingqing Gu
...
Hao Wu
Sheng Xu
Zhengyu Ma
Yong Chen
Luo Ji
BDL
195
0
0
17 Jul 2025
Spatio-Temporal LLM: Reasoning about Environments and Actions
Spatio-Temporal LLM: Reasoning about Environments and Actions
Haozhen Zheng
Beitong Tian
Mingyuan Wu
Zhenggang Tang
Klara Nahrstedt
Alex Schwing
LRM
170
2
0
07 Jul 2025
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges
Sanjeda Akter
Ibne Farabi Shihab
Anuj Sharma
VLM
265
2
0
02 Jul 2025
MotionGPT3: Human Motion as a Second Modality
MotionGPT3: Human Motion as a Second Modality
Bingfan Zhu
Biao Jiang
S. Wang
Bin Wang
Tao Chen
Linjie Luo
Youyi Zheng
Xin Chen
283
4
0
30 Jun 2025
Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Federico Tavella
Amber Drinkwater
Angelo Cangelosi
63
0
0
24 Jun 2025
TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
Zhongbin Guo
Yuhao Wang
Ping Jian
Chengzhi Li
Xinyue Chen
Zhen Yang
Ertai E
211
0
0
23 Jun 2025
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh
J. Mok
Juhyeon Shin
Juhyeon Shin
Sangha Park
J. Mok
Sungroh Yoon
VLM
322
1
0
23 Jun 2025
OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding
OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding
Hieu Nguyen
Phuc-Tan Nguyen
T. Tran
Minh-Quang Nguyen
Tam V. Nguyen
Minh-Triet Tran
T. Le
ObjDVLM
79
7
0
23 Jun 2025
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
168
0
0
19 Jun 2025
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng
Jinxiang Wang
Sen Wang
Zi Huang
Xue Li
LRM
199
0
0
19 Jun 2025
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
Shaoqing Lin
Chong Teng
Fei Li
Donghong Ji
Lizhen Qu
Z. Li
188
0
0
18 Jun 2025
Previous
123456...464748
Next