Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.09317
Cited By
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
IEEE International Conference on Computer Vision (ICCV), 2019
25 August 2019
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Unsupervised Image Captioning with Shared Multimodal Embeddings"
50 / 61 papers shown
Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition
Ranjan Sapkota
Manoj Karkee
ObjD
MU
327
17
0
06 Oct 2025
Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models
Boyang Zhang
Istemi Ekin Akkus
Ruichuan Chen
Alice Dethise
Klaus Satzke
Ivica Rimac
Yang Zhang
PILM
215
0
0
29 Sep 2025
Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
Matthew Nolan
Lina Yao
Robert Davidson
172
0
0
10 Sep 2025
AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
L. D. M. S. Sai Teja
Ashok Urlana
Pruthwik Mishra
150
0
0
09 Aug 2025
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey
Yayun Qi
Hongxi Li
Yiqi Song
Xinxiao Wu
Jiebo Luo
LRM
VLM
164
5
0
11 Dec 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
499
0
0
09 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
European Conference on Computer Vision (ECCV), 2024
Xianyu Chen
Ming Jiang
Qi Zhao
254
9
0
05 Aug 2024
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning
Xinwei Liu
Yang Liu
Yuan Xun
Yaning Tan
Simeng Qin
391
16
0
23 Jul 2024
MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
Elad Hirsch
Gefen Dawidowicz
A. Tal
MedIm
279
8
0
04 Jul 2024
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
303
3
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
336
0
0
26 Mar 2024
MedCycle: Unpaired Medical Report Generation via Cycle-Consistency
Elad Hirsch
Gefen Dawidowicz
A. Tal
MedIm
270
6
0
20 Mar 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
343
7
0
21 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
249
15
0
04 Jan 2024
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
292
21
0
14 Dec 2023
FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design
Zhen Huang
Yihao Li
Dong Pei
Jiapeng Zhou
Xuliang Ning
Jianlin Han
Xiaoguang Han
Xuejun Chen
245
3
0
13 Nov 2023
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Neural Information Processing Systems (NeurIPS), 2023
Devleena Das
Sonia Chernova
Been Kim
LRM
LLMAG
518
31
0
21 Sep 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
253
20
0
25 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
ACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
239
26
0
23 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
197
73
0
31 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
ACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
224
2
0
27 Jul 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Natural Language Processing and Chinese Computing (NLPCC), 2023
Haoqin Tu
Bowen Yang
Xianfeng Zhao
222
7
0
29 Jun 2023
Image Captioning with Multi-Context Synthetic Data
AAAI Conference on Artificial Intelligence (AAAI), 2023
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
277
21
0
29 May 2023
Text-based Person Search without Parallel Image-Text Data
ACM Multimedia (ACM MM), 2023
Yang Bai
Wenwen Qiang
Min Cao
Cheng Chen
Ziqiang Cao
Liqiang Nie
Min Zhang
357
29
0
22 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
220
19
0
03 May 2023
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIP
VLM
330
19
0
26 Apr 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
321
14
0
11 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
The Visual Computer (TVC), 2023
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
337
38
0
07 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
International Conference on Learning Representations (ICLR), 2023
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
248
126
0
06 Mar 2023
KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022
Brandon Birmingham
A. Muscat
124
1
0
07 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
IEEE Access (IEEE Access), 2023
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
174
10
0
26 Jan 2023
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Conference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
255
28
0
08 Dec 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
262
30
0
22 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
154
12
0
14 Nov 2022
Text-Only Training for Image Captioning using Noise-Injected CLIP
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
412
130
0
01 Nov 2022
Language-free Training for Zero-shot Video Grounding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Dahye Kim
Jungin Park
Jiyoung Lee
S. Park
Kwanghoon Sohn
233
32
0
24 Oct 2022
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Minjoon Jung
Seongho Choi
Joo-Kyung Kim
Jin-Hwa Kim
Byoung-Tak Zhang
259
11
0
23 Oct 2022
Data Poisoning Attacks Against Multimodal Encoders
International Conference on Machine Learning (ICML), 2022
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
440
70
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
272
4
0
29 Sep 2022
Prompt-based Learning for Unpaired Image Captioning
IEEE transactions on multimedia (IEEE TMM), 2022
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chen Chen
VLM
251
47
0
26 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
339
38
0
25 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
368
116
0
05 May 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Computer Vision and Pattern Recognition (CVPR), 2022
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
430
37
0
25 Apr 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2022
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
393
66
0
16 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
IEEE transactions on multimedia (IEEE TMM), 2022
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
246
16
0
07 Mar 2022
Unsupervised Temporal Video Grounding with Deep Semantic Clustering
AAAI Conference on Artificial Intelligence (AAAI), 2022
Daizong Liu
Xiaoye Qu
Yinzhen Wang
Xing Di
Kai Zou
Yu Cheng
Zichuan Xu
Pan Zhou
272
52
0
14 Jan 2022
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCL
VLM
206
15
0
02 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
222
57
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Computer Vision and Pattern Recognition (CVPR), 2021
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
440
246
0
29 Nov 2021
Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention
Lev Evtodienko
138
7
0
10 Nov 2021
1
2
Next
Page 1 of 2