ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08916
  4. Cited By
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
v1v2 (latest)

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

International Conference on Learning Representations (ICLR), 2022
17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
    ObjDVLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"

50 / 352 papers shown
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
420
34
0
20 Mar 2023
Generative Semantic Segmentation
Generative Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Jia-Qing Chen
Jiachen Lu
Xiatian Zhu
Li Zhang
GANISegVLM
212
60
0
20 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web VideosIEEE International Conference on Computer Vision (ICCV), 2023
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
200
21
0
17 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
ViM: Vision Middleware for Unified Downstream TransferringIEEE International Conference on Computer Vision (ICCV), 2023
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
233
2
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
Universal Instance Perception as Object Discovery and RetrievalComputer Vision and Pattern Recognition (CVPR), 2023
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOSVLMLRM
374
235
0
12 Mar 2023
UniHCP: A Unified Model for Human-Centric Perceptions
UniHCP: A Unified Model for Human-Centric PerceptionsComputer Vision and Pattern Recognition (CVPR), 2023
Yuanzheng Ci
Yizhou Wang
Meilin Chen
Weizhen He
Mengwei He
Feng Zhu
Rui Zhao
F. Yu
Donglian Qi
Wanli Ouyang
536
79
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
325
33
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion TasksComputer Vision and Pattern Recognition (CVPR), 2023
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
179
64
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question AnsweringIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
450
19
0
03 Mar 2023
StraIT: Non-autoregressive Generation with Stratified Image Transformer
StraIT: Non-autoregressive Generation with Stratified Image Transformer
Shengju Qian
Huiwen Chang
Yuanzhen Li
Zizhao Zhang
Jiaya Jia
Han Zhang
221
13
0
01 Mar 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&RoSSL
280
189
0
24 Feb 2023
Can Pre-trained Vision and Language Models Answer Visual
  Information-Seeking Questions?
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
614
150
0
23 Feb 2023
Backdoor Attacks to Pre-trained Unified Foundation Models
Backdoor Attacks to Pre-trained Unified Foundation Models
Zenghui Yuan
Yixin Liu
Kai Zhang
Pan Zhou
Lichao Sun
AAML
215
12
0
18 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
220
8
0
16 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon
  Generation
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
307
181
0
14 Feb 2023
Grounding Large Language Models in Interactive Environments with Online
  Reinforcement Learning
Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&RoLLMAG
391
238
0
06 Feb 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRMVLM
275
53
0
12 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft TokenIEEE International Conference on Computer Vision (ICCV), 2023
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Jingdong Sun
Kun He
Han Hu
330
60
0
05 Jan 2023
Do DALL-E and Flamingo Understand Each Other?
Do DALL-E and Flamingo Understand Each Other?IEEE International Conference on Computer Vision (ICCV), 2022
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
226
14
0
23 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Generalized Decoding for Pixel, Image, and LanguageComputer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLMMLLMObjD
287
326
0
21 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
369
132
0
21 Dec 2022
Universal Object Detection with Large Vision Model
Universal Object Detection with Large Vision ModelInternational Journal of Computer Vision (IJCV), 2022
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLMObjD
281
9
0
19 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Transferring General Multimodal Pretrained Models to Text RecognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
131
5
0
19 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task TranslationComputer Vision and Pattern Recognition (CVPR), 2022
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
265
18
0
13 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
147
19
0
08 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQAPattern Recognition (Pattern Recogn.), 2022
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
262
97
0
07 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document ProcessingComputer Vision and Pattern Recognition (CVPR), 2022
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
346
152
0
05 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual LearningComputer Vision and Pattern Recognition (CVPR), 2022
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLMMLLM
336
335
0
05 Dec 2022
Localization vs. Semantics: Visual Representations in Unimodal and
  Multimodal Models
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Zhuowan Li
Cihang Xie
Benjamin Van Durme
Yaoyao Liu
VLMSSL
178
2
0
01 Dec 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose
  Visual Representation
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
258
5
0
28 Nov 2022
Seeing What You Miss: Vision-Language Pre-training with Semantic
  Completion Learning
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningComputer Vision and Pattern Recognition (CVPR), 2022
Yatai Ji
Rong-Cheng Tu
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
260
17
0
24 Nov 2022
Unifying Vision-Language Representation Space with Single-tower
  Transformer
Unifying Vision-Language Representation Space with Single-tower TransformerAAAI Conference on Artificial Intelligence (AAAI), 2022
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
249
26
0
21 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without trainingComputer Vision and Pattern Recognition (CVPR), 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLMVLMLRM
439
571
0
18 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
  Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language TasksComputer Vision and Pattern Recognition (CVPR), 2022
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Jiaming Song
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
169
67
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only
  Language Supervision
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionIEEE International Conference on Computer Vision (ICCV), 2022
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
335
36
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
412
127
0
15 Nov 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Z-LaVI: Zero-Shot Language Solver Fueled by Visual ImaginationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
216
24
0
21 Oct 2022
A Survey of Computer Vision Technologies In Urban and
  Controlled-environment Agriculture
A Survey of Computer Vision Technologies In Urban and Controlled-environment AgricultureACM Computing Surveys (ACM CSUR), 2022
Jiayun Luo
Boyang Albert Li
Cyril Leung
374
23
0
20 Oct 2022
Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
369
53
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
A Generalist Framework for Panoptic Segmentation of Images and VideosIEEE International Conference on Computer Vision (ICCV), 2022
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGenMLLM
442
131
0
12 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
390
475
0
06 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image ModelInternational Conference on Learning Representations (ICLR), 2022
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
718
908
0
14 Sep 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision
  and Beyond
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
234
94
0
30 Jul 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
CLiMB: A Continual Learning Benchmark for Vision-and-Language TasksNeural Information Processing Systems (NeurIPS), 2022
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLMCLL
384
83
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeNeural Information Processing Systems (NeurIPS), 2022
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
512
496
0
17 Jun 2022
A Unified Sequence Interface for Vision Tasks
A Unified Sequence Interface for Vision TasksNeural Information Processing Systems (NeurIPS), 2022
Ting-Li Chen
Saurabh Saxena
Lala Li
Nayeon Lee
David J. Fleet
Geoffrey E. Hinton
VLMMLLM
208
171
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjDVLM
296
354
0
12 Jun 2022
Transformers in Time-series Analysis: A Tutorial
Transformers in Time-series Analysis: A TutorialCircuits, systems, and signal processing (CSSP), 2022
Sabeen Ahmed
Ian E. Nielsen
Aakash Tripathi
Shamoon Siddiqui
Ghulam Rasool
R. Ramachandran
AI4TS
318
247
0
28 Apr 2022
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial
  Images
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial ImagesIEEE Access (IEEE Access), 2022
Yajie Cui
Zhaoxiang Liu
Kai Wang
OODDRL
472
79
0
24 Apr 2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
195
83
0
25 Nov 2021
Previous
12345678
Next
Page 7 of 8