ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,353 papers shown
Hierarchical Visual Feature Aggregation for OCR-Free Document
  Understanding
Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
141
8
0
08 Nov 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with
  Captions in 28 Languages
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Juil Sock
Kenneth Church
Mohamed Elhoseiny
VLM
193
15
0
06 Nov 2024
DDFAV: Remote Sensing Large Vision Language Models Dataset and
  Evaluation Benchmark
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
Haodong Li
Haicheng Qu
Xiaofeng Zhang
182
8
0
05 Nov 2024
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing
Xingwu Sun
Cheng Fei
Charles Zhang
Fei Jin
Qian Niu
...
Pohsun Feng
Ziqian Bi
Ming Liu
Yujiao Shi
Y. Zhang
291
3
0
05 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language AttackIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yang Liu
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
232
8
0
04 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning
  Through Retrieval and Understanding Modalities
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
271
2
0
04 Nov 2024
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
Georgia Gabriela Sampaio
Ruixiang Zhang
Shuangfei Zhai
Jiatao Gu
J. Susskind
Navdeep Jaitly
Yizhe Zhang
DiffMCLIP
266
1
0
02 Nov 2024
Designing a Robust Radiology Report Generation System
Designing a Robust Radiology Report Generation System
Sonit Singh
MedIm
235
1
0
02 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
249
4
0
01 Nov 2024
Generative Emotion Cause Explanation in Multimodal Conversations
Generative Emotion Cause Explanation in Multimodal ConversationsInternational Conference on Multimedia Retrieval (ICMR), 2024
Lin Wang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Zhitao Zhang
467
1
0
01 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream
  Generalization of CLIP
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIPNeural Information Processing Systems (NeurIPS), 2024
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
262
4
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLMMLLMLRM
308
77
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
278
5
0
29 Oct 2024
MotionGPT-2: A General-Purpose Motion-Language Model for Motion
  Generation and Understanding
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Yuan Wang
Di Huang
Yaqi Zhang
Wanli Ouyang
J. Jiao
Xuetao Feng
Yan Zhou
Pengfei Wan
Weizhen He
Dan Xu
VGen
236
37
0
29 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth
  Exploration
What Factors Affect Multi-Modal In-Context Learning? An In-Depth ExplorationNeural Information Processing Systems (NeurIPS), 2024
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
207
26
0
27 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable SensorsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
267
14
0
26 Oct 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLMVLM
445
15
0
23 Oct 2024
Image-aware Evaluation of Generated Medical Reports
Image-aware Evaluation of Generated Medical ReportsNeural Information Processing Systems (NeurIPS), 2024
Gefen Dawidowicz
Elad Hirsch
A. Tal
233
2
0
22 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
189
0
0
22 Oct 2024
MotionGlot: A Multi-Embodied Motion Generation Model
MotionGlot: A Multi-Embodied Motion Generation ModelIEEE International Conference on Robotics and Automation (ICRA), 2024
Sudarshan Harithas
Srinath Sridhar
396
3
0
22 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
402
87
0
21 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
EVA: An Embodied World Model for Future Video Anticipation
Yatian Wang
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Wenhan Luo
VGen
235
18
0
20 Oct 2024
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
Shiyu Hu
Xuchen Li
Xuzhao Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
298
1
0
20 Oct 2024
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based SamplingInternational Conference on Learning Representations (ICLR), 2024
Minhyuk Seo
Hyunseo Koh
Jonghyun Choi
381
9
0
19 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense
  Concepts about Actions
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
199
1
0
17 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data
  Generation
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
161
2
0
15 Oct 2024
Efficient and Effective Universal Adversarial Attack against
  Vision-Language Pre-training Models
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Jian Shu
AAMLVLM
273
2
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
When Does Perceptual Alignment Benefit Vision Representations?Neural Information Processing Systems (NeurIPS), 2024
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
278
43
1
14 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent ApproachNeural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
360
20
0
14 Oct 2024
ChangeMinds: Multi-task Framework for Detecting and Describing Changes
  in Remote Sensing
ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing
Yuduo Wang
Weikang Yu
Michael K Kopp
Pedram Ghamisi
312
5
0
13 Oct 2024
ECIS-VQG: Generation of Entity-centric Information-seeking Questions
  from Videos
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from VideosConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Arpan Phukan
Manish Gupta
Asif Ekbal
VGen
197
1
0
13 Oct 2024
BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Peijia Qin
Ruiyi Zhang
Pengtao Xie
221
4
0
13 Oct 2024
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City
  Environment
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Chen Gao
Baining Zhao
Weichen Zhang
Jinzhu Mao
Jun Zhang
...
Jianjie Fang
Zile Zhou
Jinqiang Cui
Xinyu Chen
Yong Li
LM&Ro
249
26
0
12 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and
  CLAP-Refine through LLMs
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
275
12
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
311
15
0
12 Oct 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video
  Paragraph Captioning
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
212
1
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
227
5
0
11 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
287
11
0
09 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired
  People
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
300
3
0
08 Oct 2024
The Mystery of Compositional Generalization in Graph-based Generative
  Commonsense Reasoning
The Mystery of Compositional Generalization in Graph-based Generative Commonsense ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiyan Fu
Anette Frank
LRM
448
1
0
08 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image
  Captioner using Audiovisual Distribution Alignment
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
202
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event ModelingInternational Conference on Learning Representations (ICLR), 2024
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
282
49
0
08 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world
  Corruptions?
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
181
14
0
07 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for
  Semi-supervised Multi-modal Fake News Detection
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News DetectionAsian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
178
5
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New BenchmarkInternational Conference on Learning Representations (ICLR), 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
649
96
0
04 Oct 2024
Self-eXplainable AI for Medical Image Analysis: A Survey and New
  Outlooks
Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks
Junlin Hou
Sicen Liu
Yequan Bie
Hongmei Wang
Andong Tan
Luyang Luo
Hao Chen
XAI
366
26
0
03 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among PromptsInternational Conference on Learning Representations (ICLR), 2024
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
694
12
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human PreferencesInternational Conference on Learning Representations (ICLR), 2024
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
555
17
0
03 Oct 2024
Backdooring Vision-Language Models with Out-Of-Distribution Data
Backdooring Vision-Language Models with Out-Of-Distribution DataInternational Conference on Learning Representations (ICLR), 2024
Weimin Lyu
Jiachen Yao
Saumya Gupta
Lu Pang
Tao Sun
Lingjie Yi
Lijie Hu
Haibin Ling
Chao Chen
VLMAAML
373
15
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report
  Generation on CheXpert Plus Dataset
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus DatasetComputer Vision and Pattern Recognition (CVPR), 2024
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
339
15
0
01 Oct 2024
Previous
123...789...464748
Next