ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.14447
  4. Cited By
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic
  Arithmetic

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

29 November 2021
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
    VLM
ArXivPDFHTML

Papers citing "ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic"

50 / 127 papers shown
Title
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
20
10
0
25 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
11
1
0
15 Oct 2023
BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex
  Selectivity
BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Brian Karrer
13
15
0
06 Oct 2023
Understanding prompt engineering may not require rethinking
  generalization
Understanding prompt engineering may not require rethinking generalization
Victor Akinwande
Yiding Jiang
Dylan Sam
J. Zico Kolter
VLM
VPVLM
112
7
0
06 Oct 2023
MindGPT: Interpreting What You See with Non-invasive Brain Recordings
MindGPT: Interpreting What You See with Non-invasive Brain Recordings
Jiaxuan Chen
Yu Qi
Yueming Wang
Gang Pan
22
5
0
27 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
15
3
0
16 Sep 2023
Zero-Shot Audio Captioning via Audibility Guidance
Zero-Shot Audio Captioning via Audibility Guidance
Tal Shaharabany
Ariel Shaulov
Lior Wolf
11
4
0
07 Sep 2023
DeViL: Decoding Vision features into Language
DeViL: Decoding Vision features into Language
Meghal Dani
Isabel Rio-Torto
Stephan Alaniz
Zeynep Akata
VLM
19
7
0
04 Sep 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
22
13
0
25 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Bill Xu
Xiangnan He
VLM
CLIP
8
13
0
23 Aug 2023
Label-Free Event-based Object Recognition via Joint Learning with Image
  Reconstruction from Events
Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events
Hoonhee Cho
Hyeonseong Kim
Yujeong Chae
Kuk-Jin Yoon
VLM
17
24
0
18 Aug 2023
Text-Only Training for Visual Storytelling
Text-Only Training for Visual Storytelling
Yuechen Wang
Wen-gang Zhou
Zhenbo Lu
Houqiang Li
DiffM
16
2
0
17 Aug 2023
Improving Generalization of Image Captioning with Unsupervised Prompt
  Learning
Improving Generalization of Image Captioning with Unsupervised Prompt Learning
Hongchen Wei
Zhenzhong Chen
VLM
23
3
0
05 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
Guiding Image Captioning Models Toward More Specific Captions
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
11
15
0
31 Jul 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
8
33
0
31 Jul 2023
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars
  Using 2D Diffusion
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion
Shuo Huang
Zongxin Yang
Liangting Li
Yi Yang
Jia Jia
DiffM
9
27
0
13 Jul 2023
ExFaceGAN: Exploring Identity Directions in GAN's Learned Latent Space
  for Synthetic Identity Generation
ExFaceGAN: Exploring Identity Directions in GAN's Learned Latent Space for Synthetic Identity Generation
Fadi Boutros
Marcel Klemt
Meiling Fang
Arjan Kuijper
Naser Damer
CVBM
6
17
0
11 Jul 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
16
1
0
05 Jul 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple
  Oracles
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu
Bowen Yang
Xianfeng Zhao
11
6
0
29 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
66
65
0
19 Jun 2023
AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt
  Encoder
AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder
Tal Shaharabany
Aviad Dahan
Raja Giryes
Lior Wolf
MedIm
VLM
6
65
0
10 Jun 2023
Prompt Algebra for Task Composition
Prompt Algebra for Task Composition
Pramuditha Perera
Matthew Trager
L. Zancato
Alessandro Achille
Stefano Soatto
VLM
11
8
0
01 Jun 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented
  Language Model Prompting
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
11
16
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression Generation
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
20
7
0
30 May 2023
Image Captioning with Multi-Context Synthetic Data
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
17
7
0
29 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative
  AI
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
12
2
0
23 May 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for
  Audio-to-Image Generation
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
14
20
0
22 May 2023
ReSeTOX: Re-learning attention weights for toxicity mitigation in
  machine translation
ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation
Javier García Gilabert
Carlos Escolano
Marta R. Costa-jussá
CLL
MU
6
2
0
19 May 2023
Text-To-Concept (and Back) via Cross-Model Alignment
Text-To-Concept (and Back) via Cross-Model Alignment
Mazda Moayeri
Keivan Rezaei
Maziar Sanjabi
S. Feizi
CLIP
20
39
0
10 May 2023
From Association to Generation: Text-only Captioning by Unsupervised
  Cross-modal Mapping
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIP
VLM
9
6
0
26 Apr 2023
HOICLIP: Efficient Knowledge Transfer for HOI Detection with
  Vision-Language Models
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Sha Ning
Longtian Qiu
Yongfei Liu
Xuming He
VLM
10
41
0
28 Mar 2023
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
Jongheon Jeong
Yang Zou
Taewan Kim
Dongqing Zhang
Avinash Ravichandran
O. Dabeer
VLM
64
92
0
26 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
37
12
0
23 Mar 2023
Zero-guidance Segmentation Using Zero Segment Labels
Zero-guidance Segmentation Using Zero Segment Labels
Pitchaporn Rewatbowornwong
Nattanat Chatthee
E. Chuangsuwanich
Supasorn Suwajanakorn
VLM
15
11
0
23 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
20
11
0
21 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and
  Multilingual Natural Language Generation
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David A. Clifton
21
5
0
11 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
40
81
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
26
21
0
04 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based
  Polishing
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
6
32
0
04 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for
  Embodied Agents
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
11
41
0
01 Mar 2023
Meta Learning to Bridge Vision and Language Models for Multimodal
  Few-Shot Learning
Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning
Ivona Najdenkoska
Xiantong Zhen
M. Worring
VLM
8
18
0
28 Feb 2023
Language Is Not All You Need: Aligning Perception with Language Models
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
8
403
0
27 Feb 2023
BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
  for Generic Natural Visual Stimulus Decoding
BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding
Yulong Liu
Yongqiang Ma
Wei Zhou
Guibo Zhu
Nanning Zheng
VLM
17
34
0
25 Feb 2023
Teaching CLIP to Count to Ten
Teaching CLIP to Count to Ten
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
14
88
0
23 Feb 2023
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
Zachary Novack
Julian McAuley
Zachary Chase Lipton
Saurabh Garg
VLM
8
78
0
06 Feb 2023
Noise-aware Learning from Web-crawled Image-Text Data for Image
  Captioning
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
6
18
0
27 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
16
124
0
13 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
17
11
0
29 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
8
1,670
0
17 Nov 2022
Previous
123
Next