ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08916
  4. Cited By
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
    ObjD
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"

27 / 327 papers shown
Title
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
  Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
13
55
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
14
23
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
28
101
0
15 Nov 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
39
21
0
21 Oct 2022
A Survey of Computer Vision Technologies In Urban and
  Controlled-environment Agriculture
A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture
Jiayun Luo
Boyang Albert Li
Cyril Leung
43
10
0
20 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
30
50
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
A Generalist Framework for Panoptic Segmentation of Images and Videos
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGen
MLLM
17
102
0
12 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
15
332
0
06 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Xiao Wang
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
18
529
0
14 Sep 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
39
70
0
30 Jul 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
17
71
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
33
343
0
17 Jun 2022
A Unified Sequence Interface for Vision Tasks
A Unified Sequence Interface for Vision Tasks
Ting-Li Chen
Saurabh Saxena
Lala Li
Tsung-Yi Lin
David J. Fleet
Geoffrey E. Hinton
VLM
MLLM
11
147
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Jenq-Neng Hwang
Jianfeng Gao
ObjD
VLM
9
288
0
12 Jun 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A Tutorial
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
12
69
0
28 Apr 2022
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial
  Images
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images
Yajie Cui
Zhaoxiang Liu
Shiguo Lian
OOD
DRL
15
41
0
24 Apr 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
33
73
0
25 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
25
19
0
17 Oct 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
166
676
0
22 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
Inductive Biases for Deep Learning of Higher-Level Cognition
Inductive Biases for Deep Learning of Higher-Level Cognition
Anirudh Goyal
Yoshua Bengio
AI4CE
13
343
0
30 Nov 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
1234567