Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.14447
Cited By
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
29 November 2021
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic"
50 / 127 papers shown
Title
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning
Yassir Benhammou
Alessandro Tiberio
Gabriel Trautmann
Suman Kalyan
MLLM
VLM
29
0
0
21 Apr 2025
An Image is Worth
K
K
K
Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
Matías Piqueras
Alexandra Segerberg
Matteo Magnani
Måns Magnusson
Nataša Sladoje
28
0
0
14 Apr 2025
Concept Lancet: Image Editing with Compositional Representation Transplant
Jinqi Luo
Tianjiao Ding
Kwan Ho Ryan Chan
Hancheng Min
Chris Callison-Burch
René Vidal
DiffM
KELM
59
0
0
03 Apr 2025
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning
Mingkai Tian
Guorong Li
Yuankai Qi
Amin Beheshti
J. Shi
Anton van den Hengel
Qingming Huang
VGen
32
0
0
31 Mar 2025
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Xiaomin Yu
Pengxiang Ding
Wenjie Zhang
Siteng Huang
Songyang Gao
Chengwei Qin
Kejian Wu
Zhaoxin Fan
Ziyue Qiao
Donglin Wang
MLLM
SyDa
67
0
0
28 Mar 2025
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon
Oh Hyun-Bin
Han EunGi
Kim Sung-Bin
Suekyeong Nam
Tae-Hyun Oh
EGVM
3DH
70
0
1
26 Mar 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
42
0
0
14 Mar 2025
Fine-Grained Video Captioning through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
44
1
0
23 Feb 2025
Visual Zero-Shot E-Commerce Product Attribute Value Extraction
Jiaying Gong
Ming Cheng
Hongda Shen
Pierre-Yves Vandenbussche
Janet Jenq
Hoda Eldardiry
39
0
0
21 Feb 2025
ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis
Xueshen Li
Xinlong Hou
Ziyi Huang
Yu Gan
LM&MA
MedIm
44
0
0
15 Feb 2025
Classifier-Guided Captioning Across Modalities
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
23
0
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
38
0
0
03 Jan 2025
ContRail: A Framework for Realistic Railway Image Synthesis using ControlNet
Andrei-Robert Alexandrescu
Razvan-Gabriel Petec
Alexandru Manole
Laura-Silvia Diosan
DiffM
62
0
0
09 Dec 2024
ARLang: An Outdoor Augmented Reality Application for Portuguese Vocabulary Learning
Arthur Caetano
Alyssa Lawson
Yimeng Liu
Misha Sra
16
7
0
07 Nov 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
53
25
0
04 Oct 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Forster Feinglass
Yezhou Yang
18
0
0
30 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIP
VLM
16
0
0
26 Sep 2024
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
Cheng Charles Ma
Kevin Hyekang Joo
Alexandria K. Vail
Sunreeta Bhattacharya
Álvaro Fernández García
Kailana Baker-Matsuoka
Sheryl Mathew
Lori L. Holt
Fernando De la Torre
34
3
0
13 Sep 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
24
1
0
26 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
25
6
0
29 Jul 2024
MemBench: Memorized Image Trigger Prompt Dataset for Diffusion Models
Chunsan Hong
Tae-Hyun Oh
Minhyuk Sung
VLM
EGVM
19
0
0
24 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
19
0
0
15 Jul 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
18
6
0
11 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
21
2
0
08 Jul 2024
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
27
3
0
20 Jun 2024
Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis
Patrick Bordes
Liam Gonzalez
Masa Roller
Bernardo P. de Almeida
...
Stefan Laurent
Jan Grzegorzewski
Maren Lang
Thomas Pierrot
Guillaume Richard
AI4CE
20
1
0
20 Jun 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota
Ryo Hachiuma
Chao-Han Huck Yang
Yuta Nakashima
VLM
20
3
0
20 Jun 2024
Composed Image Retrieval for Remote Sensing
Bill Psomas
Ioannis Kakogeorgiou
Nikos Efthymiadis
Giorgos Tolias
Ondřej Chum
Yannis Avrithis
Konstantinos Karantzalos
27
4
0
24 May 2024
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Hao Zhou
Chengming Hu
Ye Yuan
Yufei Cui
Yili Jin
...
Di Wu
Xue Liu
Charlie Zhang
Xianbin Wang
Jiangchuan Liu
22
11
0
17 May 2024
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Yiitan Yuan
Zhuo Chen
Xubo Liu
Haohe Liu
Xuenan Xu
Dongya Jia
Yuanzhe Chen
Mark D. Plumbley
Wenwu Wang
CLIP
VLM
27
9
0
27 Apr 2024
The Solution for the CVPR2024 NICE Image Captioning Challenge
Longfei Huang
Shupeng Zhong
Xiangyu Wu
Ruoxuan Li
19
0
0
19 Apr 2024
Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
VLM
21
6
0
15 Apr 2024
Segment Any 3D Object with Language
Seungjun Lee
Yuyang Zhao
Gim Hee Lee
20
1
0
02 Apr 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
18
1
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
19
0
0
28 Mar 2024
TAG: Guidance-free Open-Vocabulary Semantic Segmentation
Yasufumi Kawano
Yoshimitsu Aoki
VLM
25
2
0
17 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
18
13
0
06 Mar 2024
ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Oren Sultan
Yonatan Bitton
Ron Yosef
Dafna Shahaf
16
8
0
02 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
37
7
0
29 Feb 2024
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks
Yang Liu
Xiaomin Yu
Gongyu Zhang
Christos Bergeles
Prokar Dasgupta
Alejandro Granados
Sebastien Ourselin
27
0
0
27 Feb 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
21
9
0
16 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
18
9
0
13 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
33
3
0
04 Jan 2024
A Vision Check-up for Language Models
Pratyusha Sharma
Tamar Rott Shaham
Manel Baradad
Stephanie Fu
Adrian Rodriguez-Munoz
Shivam Duggal
Phillip Isola
Antonio Torralba
VLM
LRM
75
8
0
03 Jan 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
27
2
0
14 Dec 2023
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data
Jiangming Shi
Shanshan Zheng
Xiangbo Yin
Yang Lu
Yuan Xie
Yanyun Qu
VLM
FedML
22
10
0
14 Dec 2023
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti
Camilla Longden
Agnieszka Slowik
Samuel Wills
Martin Grayson
C. Morrison
VLM
4
7
0
29 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
8
10
0
14 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
14
2
0
08 Nov 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
13
0
0
25 Oct 2023
1
2
3
Next