Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2206.08916
Cited By
v1
v2 (latest)
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
International Conference on Learning Representations (ICLR), 2022
17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"
50 / 352 papers shown
eP-ALM: Efficient Perceptual Augmentation of Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
420
34
0
20 Mar 2023
Generative Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Jia-Qing Chen
Jiachen Lu
Xiatian Zhu
Li Zhang
GAN
ISeg
VLM
212
60
0
20 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
IEEE International Conference on Computer Vision (ICCV), 2023
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
200
21
0
17 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
IEEE International Conference on Computer Vision (ICCV), 2023
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
233
2
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
Computer Vision and Pattern Recognition (CVPR), 2023
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOS
VLM
LRM
374
235
0
12 Mar 2023
UniHCP: A Unified Model for Human-Centric Perceptions
Computer Vision and Pattern Recognition (CVPR), 2023
Yuanzheng Ci
Yizhou Wang
Meilin Chen
Weizhen He
Mengwei He
Feng Zhu
Rui Zhao
F. Yu
Donglian Qi
Wanli Ouyang
536
79
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
325
33
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
179
64
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
450
19
0
03 Mar 2023
StraIT: Non-autoregressive Generation with Stratified Image Transformer
Shengju Qian
Huiwen Chang
Yuanzhen Li
Zizhao Zhang
Jiaya Jia
Han Zhang
221
13
0
01 Mar 2023
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&Ro
SSL
280
189
0
24 Feb 2023
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
614
150
0
23 Feb 2023
Backdoor Attacks to Pre-trained Unified Foundation Models
Zenghui Yuan
Yixin Liu
Kai Zhang
Pan Zhou
Lichao Sun
AAML
215
12
0
18 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
220
8
0
16 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
307
181
0
14 Feb 2023
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
International Conference on Machine Learning (ICML), 2023
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&Ro
LLMAG
391
238
0
06 Feb 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
275
53
0
12 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
IEEE International Conference on Computer Vision (ICCV), 2023
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Jingdong Sun
Kun He
Han Hu
330
60
0
05 Jan 2023
Do DALL-E and Flamingo Understand Each Other?
IEEE International Conference on Computer Vision (ICCV), 2022
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
226
14
0
23 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
287
326
0
21 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
369
132
0
21 Dec 2022
Universal Object Detection with Large Vision Model
International Journal of Computer Vision (IJCV), 2022
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
281
9
0
19 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
131
5
0
19 Dec 2022
Egocentric Video Task Translation
Computer Vision and Pattern Recognition (CVPR), 2022
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
265
18
0
13 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
147
19
0
08 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Pattern Recognition (Pattern Recogn.), 2022
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
262
97
0
07 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Computer Vision and Pattern Recognition (CVPR), 2022
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
346
152
0
05 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
336
335
0
05 Dec 2022
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Zhuowan Li
Cihang Xie
Benjamin Van Durme
Yaoyao Liu
VLM
SSL
178
2
0
01 Dec 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
258
5
0
28 Nov 2022
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Yatai Ji
Rong-Cheng Tu
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
260
17
0
24 Nov 2022
Unifying Vision-Language Representation Space with Single-tower Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2022
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
249
26
0
21 Nov 2022
Visual Programming: Compositional visual reasoning without training
Computer Vision and Pattern Recognition (CVPR), 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
439
571
0
18 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Computer Vision and Pattern Recognition (CVPR), 2022
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Jiaming Song
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
169
67
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
IEEE International Conference on Computer Vision (ICCV), 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
335
36
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
412
127
0
15 Nov 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
216
24
0
21 Oct 2022
A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture
ACM Computing Surveys (ACM CSUR), 2022
Jiayun Luo
Boyang Albert Li
Cyril Leung
374
23
0
20 Oct 2022
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
369
53
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
IEEE International Conference on Computer Vision (ICCV), 2022
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGen
MLLM
442
131
0
12 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
390
475
0
06 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
International Conference on Learning Representations (ICLR), 2022
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
718
908
0
14 Sep 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
234
94
0
30 Jul 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2022
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
384
83
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Neural Information Processing Systems (NeurIPS), 2022
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
512
496
0
17 Jun 2022
A Unified Sequence Interface for Vision Tasks
Neural Information Processing Systems (NeurIPS), 2022
Ting-Li Chen
Saurabh Saxena
Lala Li
Nayeon Lee
David J. Fleet
Geoffrey E. Hinton
VLM
MLLM
208
171
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
296
354
0
12 Jun 2022
Transformers in Time-series Analysis: A Tutorial
Circuits, systems, and signal processing (CSSP), 2022
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
318
247
0
28 Apr 2022
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images
IEEE Access (IEEE Access), 2022
Yajie Cui
Zhaoxiang Liu
Kai Wang
OOD
DRL
472
79
0
24 Apr 2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
195
83
0
25 Nov 2021
Previous
1
2
3
4
5
6
7
8
Next
Page 7 of 8