ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,353 papers shown
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
VLM-Assisted Continual learning for Visual Question Answering in Self-Driving
Yuxin Lin
Mengshi Qi
Liang Liu
Huadong Ma
CLL
291
4
0
02 Feb 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric EnhancementIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
420
0
0
28 Jan 2025
An Ensemble Model with Attention Based Mechanism for Image CaptioningComputers & electrical engineering (Comput. Electr. Eng.), 2025
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
369
14
0
28 Jan 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
365
3
0
19 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
802
355
0
17 Jan 2025
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene UnderstandingIEEE transactions on multimedia (TMM), 2025
Haomiao Xiong
Yunzhi Zhuge
Jiawen Zhu
Lu Zhang
Huchuan Lu
238
11
0
14 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token MarksComputer Vision and Pattern Recognition (CVPR), 2025
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjDVLM
529
9
0
14 Jan 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2025
Ji Soo Lee
Jongha Kim
Jeehye Na
Jinyoung Park
H. Kim
VGen
135
7
0
12 Jan 2025
Efficient Architectures for High Resolution Vision-Language ModelsInternational Conference on Computational Linguistics (COLING), 2025
Miguel Carvalho
Bruno Martins
MLLMVLM
199
1
0
05 Jan 2025
Classifier-Guided Captioning Across ModalitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
223
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image CaptioningEuropean Conference on Computer Vision (ECCV), 2024
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
287
2
0
03 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
395
4
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in MedicineInformation Fusion (Inf. Fusion), 2024
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILawLM&MALRM
449
82
0
31 Dec 2024
Multi-Agent Planning Using Visual Language Models
Multi-Agent Planning Using Visual Language ModelsEuropean Conference on Artificial Intelligence (ECAI), 2024
Michele Brienza
F. Argenziano
Vincenzo Suriani
D. Bloisi
Daniele Nardi
LM&RoLLMAG
265
6
0
31 Dec 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLMComputer Vision and Pattern Recognition (CVPR), 2024
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
422
39
0
31 Dec 2024
From Hallucinations to Facts: Enhancing Language Models with Curated
  Knowledge Graphs
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs
Ratnesh Kumar Joshi
Sagnik Sengupta
Asif Ekbal
HILMKELM
228
2
0
24 Dec 2024
SCBench: A Sports Commentary Benchmark for Video LLMs
SCBench: A Sports Commentary Benchmark for Video LLMs
Kuangzhi Ge
Lawrence Yunliang Chen
Kevin Zhang
Yulin Luo
Tianyu Shi
Liaoyuan Fan
Xiang Li
Guanqun Wang
Shanghang Zhang
230
3
0
23 Dec 2024
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye
Honglin Lin
Leyan Ou
Dairong Chen
Zihao Wang
Bin Wang
Weijia Li
Weijia Li
500
16
0
22 Dec 2024
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid
  Instruction Generation
A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction GenerationInternational Conference on Computational Linguistics (COLING), 2024
Shijie Zhou
Ruiyi Zhang
Jiuxiang Gu
Changyou Chen
VLM
282
2
0
20 Dec 2024
G-VEval: A Versatile Metric for Evaluating Image and Video Captions
  Using GPT-4o
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4oAAAI Conference on Artificial Intelligence (AAAI), 2024
Tony Cheng Tong
Sirui He
Z. Shao
Dit-Yan Yeung
276
17
0
18 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval,
  Segmentation and Step-Captioning
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2024
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
298
1
0
18 Dec 2024
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic
  Co-learning
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learningAAAI Conference on Artificial Intelligence (AAAI), 2024
Zhuyang Xie
Yan Yang
Yankai Yu
Jie Wang
Yongquan Jiang
Xiao-Jun Wu
406
2
0
16 Dec 2024
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision
  Transformers
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision TransformersNeural Information Processing Systems (NeurIPS), 2024
Dong Hoon Lee
Seunghoon Hong
232
10
0
13 Dec 2024
Automated Image Captioning with CNNs and Transformers
Automated Image Captioning with CNNs and Transformers
Joshua Adrian Cahyono
Jeremy Nathan Jusuf
VLMViT
120
1
0
13 Dec 2024
NowYouSee Me: Context-Aware Automatic Audio Description
NowYouSee Me: Context-Aware Automatic Audio DescriptionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
326
2
0
13 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
445
16
0
12 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu Liu
Jiaya Jia
AuLLM
287
18
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
490
6
0
12 Dec 2024
CoMA: Compositional Human Motion Generation with Multi-modal Agents
CoMA: Compositional Human Motion Generation with Multi-modal Agents
Shanlin Sun
Gabriel De Araujo
Jiaqi Xu
S. Kevin Zhou
Hanwen Zhang
Ziheng Huang
Chenyu You
Xiaohui Xie
427
13
0
10 Dec 2024
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorACM Multimedia (MM), 2024
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLMLRM
323
11
0
08 Dec 2024
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large
  Vision-Language Model via Causality Analysis
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality AnalysisIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Po-Hsuan Huang
Jeng-Lin Li
Chin-Po Chen
Ming-Ching Chang
Wei-Chao Chen
LRM
297
4
0
04 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
658
6
0
04 Dec 2024
DIR: Retrieval-Augmented Image Captioning with Comprehensive
  Understanding
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
305
1
0
02 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified FlowsComputer Vision and Pattern Recognition (CVPR), 2024
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
451
25
0
02 Dec 2024
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual PreferencesComputer Vision and Pattern Recognition (CVPR), 2024
Hongyan Zhi
Peihao Chen
Junyan Li
Shuailei Ma
Xinyu Sun
Tianhang Xiang
Yinjie Lei
Mingkui Tan
Chuang Gan
432
25
0
02 Dec 2024
DOGR: Towards Versatile Visual Document Grounding and Referring
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou
Yuxin Chen
Haokun Lin
Shuyu Yang
Li Zhu
Chen Ma
Chen Ma
Mingyu Ding
Ying Shan
ObjD
553
4
0
26 Nov 2024
Diagram-Driven Course Questions Generation
Diagram-Driven Course Questions Generation
Xinyu Zhang
L. Zhang
Yanrui Wu
Muye Huang
Wenjun Wu
Bo Li
Shaowei Wang
Jun Liu
Jun Liu
429
0
0
26 Nov 2024
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
Yuan-Ming Li
An-Lan Wang
Kun-Yu Lin
Yu-Ming Tang
Ling-an Zeng
Jian-Fang Hu
Wei-Shi Zheng
542
6
0
26 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
406
7
0
25 Nov 2024
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
IterIS: Iterative Inference-Solving Alignment for LoRA MergingComputer Vision and Pattern Recognition (CVPR), 2024
Hongxu Chen
Runshi Li
Bowei Zhu
Zhen Wang
Long Chen
MoMe
432
2
0
21 Nov 2024
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
Siwen Jiao
Yangyi Fang
Baoyun Peng
Wangqun Chen
Bharadwaj Veeravalli
470
11
0
20 Nov 2024
The Power of Many: Multi-Agent Multimodal Models for Cultural Image CaptioningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Longju Bai
Angana Borah
Oana Ignat
Amélie Reymond
VLM
321
6
0
18 Nov 2024
SymDPO: Boosting In-Context Learning of Large Multimodal Models with
  Symbol Demonstration Direct Preference Optimization
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference OptimizationComputer Vision and Pattern Recognition (CVPR), 2024
Hongrui Jia
Chaoya Jiang
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
MLLM
392
7
0
17 Nov 2024
Unstructured Text Enhanced Open-domain Dialogue System: A Systematic
  Survey
Unstructured Text Enhanced Open-domain Dialogue System: A Systematic Survey
Longxuan Ma
Mingda Li
Weinan Zhang
Jiapeng Li
Ting Liu
349
19
0
14 Nov 2024
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Sagnik Majumder
Tushar Nagarajan
Ziad Al-Halah
Reina Pradhan
Kristen Grauman
424
0
0
13 Nov 2024
Grounded Video Caption Generation
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
270
0
0
12 Nov 2024
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
255
1
0
11 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGenVLM
782
5
0
11 Nov 2024
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
Hao Liang
Zirong Chen
Feiyu Xiong
Wentao Zhang
312
0
0
11 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
391
2
0
09 Nov 2024
Previous
123...678...464748
Next