ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,351 papers shown
Title
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following AbilityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
227
3
0
18 Jun 2025
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Qirui Zheng
Xingbo Wang
Keyuan Cheng
Muhammad Asif Ali
Yunlong Lu
Wenxin Li
170
0
0
17 Jun 2025
An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability
An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability
Yusuke Yamauchi
Taro Yano
Masafumi Oyamada
ELM
166
6
0
16 Jun 2025
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration
Jun Wang
Lixing Zhu
Xiaohan Yu
A. Bhalerao
Yulan He
294
0
0
12 Jun 2025
ABS: Enforcing Constraint Satisfaction On Generated Sequences Via Automata-Guided Beam Search
ABS: Enforcing Constraint Satisfaction On Generated Sequences Via Automata-Guided Beam Search
Vincenzo Collura
Karim Tit
Laura Bussi
Eleonora Giunchiglia
Maxime Cordy
225
1
0
11 Jun 2025
A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning
Swadhin Das
Divyansh Mundra
Priyanshu Dayal
Raksha Sharma
134
0
0
11 Jun 2025
Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations
Yibo Cui
Liang Xie
Yu Zhao
Jiawei Sun
Erwei Yin
155
2
0
10 Jun 2025
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline
Brian Gordon
Yonatan Bitton
Andreea Marzoca
Yasumasa Onoe
Xiao Wang
Daniel Cohen-Or
Idan Szpektor
CoGe
177
0
0
09 Jun 2025
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
Jin Huang
Yuchao Jin
Le An
Josh Park
VLM
150
2
0
09 Jun 2025
FREE: Fast and Robust Vision Language Models with Early Exits
FREE: Fast and Robust Vision Language Models with Early ExitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Divya J. Bajpai
M. Hanawal
VLM
133
2
0
07 Jun 2025
ExAct: A Video-Language Benchmark for Expert Action Analysis
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius
184
1
0
06 Jun 2025
AuthGuard: Generalizable Deepfake Detection via Language Guidance
Guangyu Shen
Zhihua Li
Xiang Xu
Tianchen Zhao
Zheng Zhang
Dongsheng An
Zhuowen Tu
Yifan Xing
Qin Zhang
188
1
0
04 Jun 2025
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Di Wen
Lei Qi
Kunyu Peng
Kailun Yang
Fei Teng
...
Yufan Chen
R. Liu
Yitian Shi
M. Sarfraz
Rainer Stiefelhagen
362
0
0
03 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in VideosAAAI Conference on Artificial Intelligence (AAAI), 2025
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
215
2
0
03 Jun 2025
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluationComputer Science Review (CSR), 2025
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
151
2
0
03 Jun 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
EGVM
390
7
0
02 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
160
1
0
01 Jun 2025
The Security Threat of Compressed Projectors in Large Vision-Language Models
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang
Ruobing Xie
Xingwu Sun
Jiansheng Chen
Zhanhui Kang
Di Wang
Yu Wang
137
0
0
31 May 2025
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade
Divyansh Jhunjhunwala
Milos Vujasinovic
Gauri Joshi
Anne-Marie Kermarrec
MoMe
276
0
0
29 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
233
4
0
29 May 2025
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Shikhhar Siingh
Abhinav Rawat
Chitta Baral
Vivek Gupta
303
0
0
28 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
145
1
0
28 May 2025
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects
Yixin Cui
Haotian Lin
Shuo Yang
Yixiao Wang
Yanjun Huang
Hong Chen
LM&RoLRMELM
332
6
0
26 May 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic VideosAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Fanheng Kong
Jingyuan Zhang
Hongzhi Zhang
Shi Feng
Daling Wang
Linhao Yu
Xingguang Ji
Yu Tian
Qi Wang
Fuzheng Zhang
267
2
0
26 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLMVLM
825
0
0
23 May 2025
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Cheng Peng
Kai Zhang
Mengxian Lyu
Hongfang Liu
Lichao Sun
Yonghui Wu
LM&MAMedImVLM
431
2
0
23 May 2025
Panoptic Captioning: An Equivalence Bridge for Image and Text
Panoptic Captioning: An Equivalence Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
655
0
0
22 May 2025
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports
Francesco Dalla Serra
Patrick Schrempf
Chaoyang Wang
Zaiqiao Meng
Fani Deligianni
Alison Q. OÑeil
169
0
0
22 May 2025
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
254
0
0
22 May 2025
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving
Yunsheng Ma
Burhaneddin Yaman
Xin Ye
Mahmut Yurt
Jingru Luo
Abhirup Mallik
Ziran Wang
Liu Ren
205
1
0
21 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Steven Chacko
240
4
0
21 May 2025
TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving
TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving
Hossein Hassani
Soodeh Nikan
Abdallah Shami
MLLM
221
0
0
21 May 2025
DC-Scene: Data-Centric Learning for 3D Scene Understanding
DC-Scene: Data-Centric Learning for 3D Scene Understanding
Ting Huang
Zeyu Zhang
Ruicheng Zhang
Yang Zhao
211
6
0
21 May 2025
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Yanshu Li
Tian Yun
Tian Yun
Pinyuan Feng
Jinfa Huang
Ruixiang Tang
378
22
0
21 May 2025
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation
Xinran Wang
Muxi Diao
Yuanzhi Liu
Chunyu Wang
Kongming Liang
Zhanyu Ma
Jun Guo
258
0
0
21 May 2025
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and PerspectivesIEEE Geoscience and Remote Sensing Magazine (GRSM), 2025
Xingxing Weng
Chao Pang
Gui-Song Xia
VLM
312
9
0
20 May 2025
KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language Models
KERL: Knowledge-Enhanced Personalized Recipe Recommendation using Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Fnu Mohbat
Mohammed J Zaki
153
1
0
20 May 2025
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
Lihong Chen
Hossein Hassani
Soodeh Nikan
VLM
316
4
0
19 May 2025
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
429
1
0
19 May 2025
Content Generation Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Content Generation Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Yuan Zhang
Xinfeng Zhang
Xiaoming Qi Xinyu Wu
Xinyu Wu
Feng Chen
Guanyu Yang
Huazhu Fu
MedImLM&MAAI4CE
438
1
0
16 May 2025
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
Alexey Magay
Dhurba Tripathi
Yu Hao
Yi Fang
196
1
0
16 May 2025
Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts
Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts
Peixuan Ge
Tongkun Su
Faqin Lv
Baoliang Zhao
Peng Zhang
...
Liang Yao
Yu Sun
Zenan Wang
Pak Kin Wong
Ying Hu
MedIm
219
1
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLMVLM
334
5
0
13 May 2025
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models
Shucheng Huang
Freda Shi
Chen Sun
Jiaming Zhong
Minghao Ning
Yufeng Yang
Yukun Lu
Hong Wang
A. Khajepour
423
0
0
11 May 2025
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios
Ning Cheng
Jinan Xu
Jialing Chen
Wenjuan Han
LRM
271
0
0
07 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
170
2
0
04 May 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
461
12
0
29 Apr 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLMVLM
470
3
0
28 Apr 2025
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation
Ling You
Hao Wu
Xinni Xie
Xiangyi Wei
Bangyan Li
Shaohui Lin
Yang Li
Changbo Wang
VGen
983
5
0
24 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
452
3
0
22 Apr 2025
Previous
12345...464748
Next