ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,351 papers shown
Title
Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Yerim Jeon
Miso Lee
WonJun Moon
Jae-Pil Heo
16
0
0
02 Dec 2025
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
Songyan Zhang
Wenhui Huang
Zhan Chen
Chua Jiahao Collister
Qihang Huang
Chen Lv
OffRLLRM
120
1
0
01 Dec 2025
Leveraging Textual Compositional Reasoning for Robust Change Captioning
Leveraging Textual Compositional Reasoning for Robust Change Captioning
Kyu Ri Park
Jiyoung Park
Seong Tae Kim
Hong Joo Lee
Jung Uk Kim
CoGe
46
0
0
28 Nov 2025
HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model
HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model
Chen Li
Eric Peh
Basura Fernando
92
0
0
28 Nov 2025
Scaling Foundation Models for Radar Scene Understanding
Scaling Foundation Models for Radar Scene Understanding
Pushkal Mishra
Kshitiz Bansal
Dinesh Bharadia
183
0
0
26 Nov 2025
BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model
BUSTR: Breast Ultrasound Text Reporting with a Descriptor-Aware Vision-Language Model
Rawa Mohammed
Mina Attin
Bryar Shareef
118
0
0
26 Nov 2025
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding
Yutao Tang
Cheng Zhao
Gaurav Mittal
Rohith Kukkala
Rama Chellappa
Cheng-Fang Peng
Mei Chen
VLM
116
0
0
26 Nov 2025
Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning
Xiaoxing You
Qiang Huang
Lingyu Li
C. Zhang
Xiaopeng Liu
M. Zhang
Jun-chen Yu
DiffM
488
0
0
26 Nov 2025
CaptionQA: Is Your Caption as Useful as the Image Itself?
CaptionQA: Is Your Caption as Useful as the Image Itself?
Shijia Yang
Yunong Liu
Bohan Zhai
Ximeng Sun
Zicheng Liu
E. Barsoum
Manling Li
Chenfeng Xu
CoGe
154
0
0
26 Nov 2025
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model
Dapeng Zhang
Fei Shen
Rui Zhao
Yinda Chen
Peng Zhi
Chenyang Li
R. Zhou
Qingguo Zhou
VLM
158
0
0
25 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
132
1
0
24 Nov 2025
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System
Runwei Guan
Rongsheng Hu
Shangshu Chen
Ningyuan Xiao
Xue Xia
...
Ningwei Ouyang
Shaofeng Liang
Yuxuan Fan
Wanjie Sun
Yutao Yue
96
0
0
23 Nov 2025
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
Teng Fu
Mengyang Zhao
Ke Niu
Kaixin Peng
Bin Li
48
0
0
21 Nov 2025
Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation
Elena V. Epure
Yashar Deldjoo
Bruno Sguerra
Markus Schedl
Manuel Moussallam
124
0
0
20 Nov 2025
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification
Yao Qin
Yangyang Yan
YuanChao Yang
Jinhua Pang
Huanyong Bi
Yuan Liu
HaiHua Wang
MedIm
116
0
0
18 Nov 2025
MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation
MedGEN-Bench: Contextually entangled benchmark for open-ended multimodal medical generation
Junjie Yang
Yuhao Yan
Gang Wu
Y Samuel Wang
Ruoyu Liang
...
Xiang Wan
Fenglei Fan
Yongquan Zhang
Feiwei Qin
Changmiao Wang
MedImLM&MAVLM
441
0
0
17 Nov 2025
A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation
A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation
Puzhen Wu
Hexin Dong
Yi Lin
Yihao Ding
Yifan Peng
MedIm
120
0
0
15 Nov 2025
Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods
Weichen Liu
Qiyao Xue
Haoming Wang
Xiangyu Yin
Boyuan Yang
Wei Gao
87
1
0
14 Nov 2025
Large Sign Language Models: Toward 3D American Sign Language Translation
Large Sign Language Models: Toward 3D American Sign Language Translation
S. Zhang
Xiaoxiao He
Di Liu
Zhaoyang Xia
Mingyu Zhao
Chaowei Tan
Vivian Li
Bo Liu
Dimitris N. Metaxas
Mubbasir Kapadia
SLR
253
0
0
11 Nov 2025
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
Xiangyang Wu
Liu Liu
Baosheng Yu
J. Qiu
Zhenwei Shi
90
0
0
11 Nov 2025
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
Ying Cheng
Y. Lin
Min-Hung Chen
Fu-En Yang
S. Lai
155
0
0
10 Nov 2025
Dense Motion Captioning
Dense Motion Captioning
Shiyao Xu
Benedetta Liberatori
Gül Varol
Paolo Rota
108
0
0
07 Nov 2025
ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained Evaluation
ChiMDQA: Towards Comprehensive Chinese Document QA with Fine-grained EvaluationInternational Conference on Artificial Neural Networks (ICANN), 2025
Jing Gao
Shutiao Luo
Yumeng Liu
Yuanming Li
Hongji Zeng
72
0
0
05 Nov 2025
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models
Jay Mohta
Kenan E. Ak
Dimitrios Dimitriadis
Yan Xu
Mingwei Shen
CLLVLM
250
0
0
03 Nov 2025
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis
Dongheng Lin
Mengxue Qu
Kunyang Han
Jianbo Jiao
Xiaojie Jin
Yunchao Wei
108
0
0
02 Nov 2025
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack
Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based AttackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Xin Liu
Aoyang Zhou
Aoyang Zhou
AAML
84
0
0
02 Nov 2025
PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting
PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting
Danyal Maqbool
Changhee Lee
Zachary Huemann
Samuel Church
Matthew E. Larson
...
Xin Tie
J. Merkow
Junjie Hu
Steve Y. Cho
Tyler Bradshaw
VLM
405
0
0
31 Oct 2025
Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges
Kemal Oksuz
Alexandru Buburuzan
Anthony Knittel
Yuhan Yao
P. Dokania
12
0
0
31 Oct 2025
Masked Diffusion Captioning for Visual Feature Learning
Masked Diffusion Captioning for Visual Feature Learning
Chao Feng
Zihao Wei
Andrew Owens
DiffM
215
0
0
30 Oct 2025
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Nvidia
Yan Wang
W. Luo
Junjie Bai
Yulong Cao
...
Yurong You
Xiaohui Zeng
Wenyuan Zhang
Boris Ivanovic
Marco Pavone
LRM
120
9
0
30 Oct 2025
More than a Moment: Towards Coherent Sequences of Audio Descriptions
More than a Moment: Towards Coherent Sequences of Audio Descriptions
Eshika Khandelwal
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Andrew Zisserman
Gül Varol
Makarand Tapaswi
DiffM
80
0
0
29 Oct 2025
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
Binbin Li
Guimiao Yang
Zisen Qi
Haiping Wang
Yu Ding
VLM
307
0
0
28 Oct 2025
Listening without Looking: Modality Bias in Audio-Visual Captioning
Listening without Looking: Modality Bias in Audio-Visual Captioning
Yuchi Ishikawa
Toranosuke Manabe
Tatsuya Komatsu
Y. Aoki
60
0
0
28 Oct 2025
What do vision-language models see in the context? Investigating multimodal in-context learning
What do vision-language models see in the context? Investigating multimodal in-context learning
G. O. D. Santos
Esther Colombini
Sandra Avila
92
0
0
28 Oct 2025
VC4VG: Optimizing Video Captions for Text-to-Video Generation
VC4VG: Optimizing Video Captions for Text-to-Video Generation
Yang Du
Zhuoran Lin
Kaiqiang Song
Biao Wang
Zhicheng Zheng
Tiezheng Ge
Bo Zheng
Qin Jin
94
0
0
28 Oct 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
310
0
0
27 Oct 2025
DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning
Eddison Pham
Prisha Priyadarshini
Adrian Maliackel
Kanishk Bandi
Cristian Meo
Kevin Zhu
132
0
0
27 Oct 2025
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
Hai-Dang Nguyen
Minh-Anh Dang
Minh-Tan Le
Minh-Tuan Le
59
1
0
26 Oct 2025
Towards Fine-Grained Human Motion Video Captioning
Towards Fine-Grained Human Motion Video Captioning
Guorui Song
Guocun Wang
Zhe Huang
Jing Lin
Xuefei Zhe
Jian Li
Haoqian Wang
60
0
0
24 Oct 2025
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile
Valentino Maiorca
Diego Doimo
Francesco Locatello
Alberto Cazzaniga
101
0
0
24 Oct 2025
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
Konstantinos Bacharidis
Antonis A. Argyros
108
0
0
22 Oct 2025
Chain-of-Conceptual-Thought Elicits Daily Conversation in Large Language Models
Chain-of-Conceptual-Thought Elicits Daily Conversation in Large Language Models
Qingqing Gu
Dan Wang
Yue Zhao
Xiaoyu Wang
Zhonglin Jiang
Yong Chen
Hongyan Li
Luo Ji
ReLMLRM
253
0
0
21 Oct 2025
PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
Amith Ananthram
Elias Stengel-Eskin
Lorena A. Bradford
Julia Demarest
Adam Purvis
Keith Krut
Robert Stein
Rina Elster Pantalony
Mohit Bansal
Kathleen McKeown
88
0
0
21 Oct 2025
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Wenhui Huang
Changhe Chen
Han Qi
Chen Lv
Yilun Du
Heng Yang
LM&RoLRM
325
1
0
21 Oct 2025
HouseTour: A Virtual Real Estate A(I)gent
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen
Marc Pollefeys
Daniel Barath
Iro Armeni
VGen
205
1
0
20 Oct 2025
EMRRG: Efficient Fine-Tuning Pre-trained X-ray Mamba Networks for Radiology Report Generation
EMRRG: Efficient Fine-Tuning Pre-trained X-ray Mamba Networks for Radiology Report Generation
Mingzheng Zhang
Jinfeng Gao
Dan Xu
Jiangrui Yu
Yuhan Qiao
Lan Chen
Jin Tang
Xiao Wang
MambaMedIm
164
0
0
19 Oct 2025
How Universal Are SAM2 Features?
How Universal Are SAM2 Features?
Masoud Khairi Atani
Alon Harell
Hyomin Choi
Runyu Yang
Fabien Racapé
Ivan V. Bajić
VLM
112
0
0
19 Oct 2025
EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning
EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning
Haoran Sun
Chen Cai
Huiping Zhuang
Kong Aik Lee
Lap-Pui Chau
Yi Wang
104
0
0
18 Oct 2025
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
Hui Wang
J. Zhao
Yifan Yang
Shujie Liu
Junyang Chen
...
Jinyu Li
Jiaming Zhou
Haoqin Sun
Yan Lu
Yong Qin
AuLLMELM
194
1
0
16 Oct 2025
Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding
Shot2Tactic-Caption: Multi-Scale Captioning of Badminton Videos for Tactical Understanding
Ning Ding
Keisuke Fujii
Toru Tamaki
68
0
0
16 Oct 2025
1234...464748
Next