Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.09734
Cited By
ClipCap: CLIP Prefix for Image Captioning
18 November 2021
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ClipCap: CLIP Prefix for Image Captioning"
50 / 86 papers shown
Title
Mitigating Image Captioning Hallucinations in Vision-Language Models
Fei Zhao
C. Zhang
Runlin Zhang
Tianyang Wang
Xi Li
VLM
34
0
0
06 May 2025
CAMU: Context Augmentation for Meme Understanding
Girish A. Koushik
Diptesh Kanojia
Helen Treharne
Aditya Joshi
VLM
91
0
0
24 Apr 2025
Class-Conditional Distribution Balancing for Group Robust Classification
Miaoyun Zhao
Qiang Zhang
C. Li
60
1
0
24 Apr 2025
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Zehong Ma
Hao Chen
Wei Zeng
Limin Su
Shiliang Zhang
AI4TS
32
0
0
10 Apr 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Xia Hu
Bo Yuan
VLM
53
0
0
24 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
79
0
0
13 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
60
0
0
11 Mar 2025
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
Jiashu Qu
Yuxiao Zhou
Yuehan Qin
Tiankai Yang
Yue Zhao
78
1
0
08 Mar 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
66
1
0
25 Feb 2025
LaVCa: LLM-assisted Visual Cortex Captioning
Takuya Matsuyama
Shinji Nishimoto
Yu Takagi
48
0
0
20 Feb 2025
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
92
0
0
20 Jan 2025
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
...
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
36
6
0
31 Dec 2024
Prompt-enhanced Network for Hateful Meme Classification
Junxi Liu
Yanyan Feng
Jiehai Chen
Yun Xue
Fenghuan Li
VLM
53
0
0
12 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
25
0
0
09 Nov 2024
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
Hoin Jung
T. Jang
Xiaoqian Wang
VLM
21
2
0
10 Oct 2024
Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information
Runze Xia
Congchi Yin
Piji Li
16
0
0
30 Sep 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
59
1
0
04 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
World Models with Hints of Large Language Models for Goal Achieving
Zeyuan Liu
Ziyu Huan
Xiyao Wang
Jiafei Lyu
Jian Tao
Xiu Li
Furong Huang
Huazhe Xu
LM&Ro
LRM
AI4CE
29
1
0
11 Jun 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM
Xiaoyu Chen
Changde Du
Che Liu
Yizhe Wang
Huiguang He
24
2
0
13 May 2024
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
Zhi-Yi Lin
Jouh Yeong Chew
J. C. V. Gemert
Xucong Zhang
36
1
0
16 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
31
9
0
12 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
83
1
0
12 Apr 2024
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
66
49
0
02 Apr 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
45
4
0
19 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
20
13
0
06 Mar 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
LaViP:Language-Grounded Visual Prompts
Nilakshan Kunananthaseelan
Jing Zhang
Mehrtash Harandi
VLM
8
0
0
18 Dec 2023
MATK: The Meme Analytical Tool Kit
Ming Shan Hee
Aditi Kumaresan
N. Hoang
Nirmalendu Prakash
Rui Cao
Roy Ka-Wei Lee
VLM
17
2
0
11 Dec 2023
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
39
2
0
07 Dec 2023
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
15
3
0
06 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
24
76
0
30 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
16
28
0
29 Nov 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
30
21
0
12 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Text Embeddings Reveal (Almost) As Much As Text
John X. Morris
Volodymyr Kuleshov
Vitaly Shmatikov
Alexander M. Rush
RALM
24
89
0
10 Oct 2023
Sentence-level Prompts Benefit Composed Image Retrieval
Yang Bai
Xinxing Xu
Yong-Jin Liu
Salman Khan
Fahad Khan
Wangmeng Zuo
Rick Siow Mong Goh
Chun-Mei Feng
20
26
0
09 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
26
2
0
27 Sep 2023
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello
L. Yu
Yixin Nie
Armen Aghajanyan
Barlas Oğuz
11
29
0
27 Sep 2023
Tackling VQA with Pretrained Foundation Models without Further Training
Alvin De Jun Tan
Bingquan Shen
MLLM
13
1
0
27 Sep 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan-Chia Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming Yang
Yuan Qi
16
10
0
20 Sep 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
22
13
0
25 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
51
18
0
23 Aug 2023
ViCo: Engaging Video Comment Generation with Human Preference Rewards
Yuchong Sun
Bei Liu
Xu Chen
Ruihua Song
Jianlong Fu
VGen
20
2
0
22 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
23
5
0
02 Aug 2023
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
22
3
0
19 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
38
0
0
10 Jul 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
12
10
0
14 Jun 2023
1
2
Next