Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.08567
Cited By
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
16 January 2024
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data"
12 / 12 papers shown
Title
Learning to Match Unpaired Data with Minimum Entropy Coupling
Mustapha Bounoua
Giulio Franzese
Pietro Michiardi
31
0
0
11 Mar 2025
Fine-Grained Video Captioning through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
46
1
0
23 Feb 2025
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Jianxiong Li
Zhihao Wang
Jinliang Zheng
Xiaoai Zhou
Guanming Wang
...
Yu Liu
Jingjing Liu
Ya-Qin Zhang
Junzhi Yu
Xianyuan Zhan
25
0
0
02 Oct 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Ju Liu
VLM
64
1
0
14 Sep 2024
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
23
9
0
15 Mar 2024
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
40
81
0
06 Mar 2023
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
41
69
0
01 Nov 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
J. Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
112
35
0
25 May 2022
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
191
92
0
26 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
242
554
0
28 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
180
342
0
13 Jul 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
1