Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1411.5726
Cited By
v1
v2 (latest)
CIDEr: Consensus-based Image Description Evaluation
Computer Vision and Pattern Recognition (CVPR), 2014
20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CIDEr: Consensus-based Image Description Evaluation"
50 / 2,353 papers shown
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
Neural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
141
8
0
08 Nov 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Juil Sock
Kenneth Church
Mohamed Elhoseiny
VLM
193
15
0
06 Nov 2024
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
Haodong Li
Haicheng Qu
Xiaofeng Zhang
182
8
0
05 Nov 2024
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing
Xingwu Sun
Cheng Fei
Charles Zhang
Fei Jin
Qian Niu
...
Pohsun Feng
Ziqian Bi
Ming Liu
Yujiao Shi
Y. Zhang
291
3
0
05 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yang Liu
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
232
8
0
04 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
271
2
0
04 Nov 2024
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
Georgia Gabriela Sampaio
Ruixiang Zhang
Shuangfei Zhai
Jiatao Gu
J. Susskind
Navdeep Jaitly
Yizhe Zhang
DiffM
CLIP
266
1
0
02 Nov 2024
Designing a Robust Radiology Report Generation System
Sonit Singh
MedIm
235
1
0
02 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
249
4
0
01 Nov 2024
Generative Emotion Cause Explanation in Multimodal Conversations
International Conference on Multimedia Retrieval (ICMR), 2024
Lin Wang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Zhitao Zhang
467
1
0
01 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Neural Information Processing Systems (NeurIPS), 2024
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
262
4
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLM
MLLM
LRM
308
77
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
278
5
0
29 Oct 2024
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding
Yuan Wang
Di Huang
Yaqi Zhang
Wanli Ouyang
J. Jiao
Xuetao Feng
Yan Zhou
Pengfei Wan
Weizhen He
Dan Xu
VGen
236
37
0
29 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
Neural Information Processing Systems (NeurIPS), 2024
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
207
26
0
27 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
267
14
0
26 Oct 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
International Conference on Learning Representations (ICLR), 2024
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
445
15
0
23 Oct 2024
Image-aware Evaluation of Generated Medical Reports
Neural Information Processing Systems (NeurIPS), 2024
Gefen Dawidowicz
Elad Hirsch
A. Tal
233
2
0
22 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
189
0
0
22 Oct 2024
MotionGlot: A Multi-Embodied Motion Generation Model
IEEE International Conference on Robotics and Automation (ICRA), 2024
Sudarshan Harithas
Srinath Sridhar
396
3
0
22 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
402
87
0
21 Oct 2024
EVA: An Embodied World Model for Future Video Anticipation
Yatian Wang
Hengyuan Zhang
Chun-Kai Fan
Xingqun Qi
Rongyu Zhang
...
Chi-Min Chan
Wei Xue
Wenhan Luo
Shanghang Zhang
Wenhan Luo
VGen
235
18
0
20 Oct 2024
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
Shiyu Hu
Xuchen Li
Xuzhao Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
298
1
0
20 Oct 2024
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
International Conference on Learning Representations (ICLR), 2024
Minhyuk Seo
Hyunseo Koh
Jonghyun Choi
381
9
0
19 Oct 2024
ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
LM&Ro
199
1
0
17 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
161
2
0
15 Oct 2024
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Jian Shu
AAML
VLM
273
2
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
Neural Information Processing Systems (NeurIPS), 2024
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
278
43
1
14 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Neural Information Processing Systems (NeurIPS), 2024
Rory Young
Nicolas Pugeault
AAML
360
20
0
14 Oct 2024
ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing
Yuduo Wang
Weikang Yu
Michael K Kopp
Pedram Ghamisi
312
5
0
13 Oct 2024
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Arpan Phukan
Manish Gupta
Asif Ekbal
VGen
197
1
0
13 Oct 2024
BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
Peijia Qin
Ruiyi Zhang
Pengtao Xie
221
4
0
13 Oct 2024
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Chen Gao
Baining Zhao
Weichen Zhang
Jinzhu Mao
Jun Zhang
...
Jianjie Fang
Zile Zhou
Jinqiang Cui
Xinyu Chen
Yong Li
LM&Ro
249
26
0
12 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
275
12
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
311
15
0
12 Oct 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
212
1
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
227
5
0
11 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
International Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
287
11
0
09 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
300
3
0
08 Oct 2024
The Mystery of Compositional Generalization in Graph-based Generative Commonsense Reasoning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiyan Fu
Anette Frank
LRM
448
1
0
08 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
202
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
International Conference on Learning Representations (ICLR), 2024
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
282
49
0
08 Oct 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
Chunyi Li
Junxuan Zhang
Zicheng Zhang
H. Wu
Yuan Tian
...
Guo Lu
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
AAML
181
14
0
07 Oct 2024
CoVLM: Leveraging Consensus from Vision-Language Models for Semi-supervised Multi-modal Fake News Detection
Asian Conference on Computer Vision (ACCV), 2024
Devank
Jayateja Kalla
Soma Biswas
178
5
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
International Conference on Learning Representations (ICLR), 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
649
96
0
04 Oct 2024
Self-eXplainable AI for Medical Image Analysis: A Survey and New Outlooks
Junlin Hou
Sicen Liu
Yequan Bie
Hongmei Wang
Andong Tan
Luyang Luo
Hao Chen
XAI
366
26
0
03 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
International Conference on Learning Representations (ICLR), 2024
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
694
12
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
International Conference on Learning Representations (ICLR), 2024
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
555
17
0
03 Oct 2024
Backdooring Vision-Language Models with Out-Of-Distribution Data
International Conference on Learning Representations (ICLR), 2024
Weimin Lyu
Jiachen Yao
Saumya Gupta
Lu Pang
Tao Sun
Lingjie Yi
Lijie Hu
Haibin Ling
Chao Chen
VLM
AAML
373
15
0
02 Oct 2024
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Computer Vision and Pattern Recognition (CVPR), 2024
Xiao Wang
Fuling Wang
Yuehang Li
Qingchuan Ma
Shiao Wang
Bo Jiang
Chuanfu Li
Jin Tang
339
15
0
01 Oct 2024
Previous
1
2
3
...
7
8
9
...
46
47
48
Next