Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2206.08916
Cited By
v1
v2 (latest)
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
International Conference on Learning Representations (ICLR), 2022
17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"
50 / 352 papers shown
Title
Single-Model and Any-Modality for Video Object Tracking
Computer Vision and Pattern Recognition (CVPR), 2023
Zongwei Wu
Jilai Zheng
Xiangxuan Ren
Florin-Alexandru Vasluianu
Chao Ma
D. Paudel
Luc Van Gool
Radu Timofte
316
90
0
27 Nov 2023
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models
Xiaoyu Yang
Lijian Xu
Hao Sun
Jiaming Song
Shaoting Zhang
ObjD
401
10
0
21 Nov 2023
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
297
288
0
18 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
204
9
0
15 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
295
17
0
14 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Computer Vision and Pattern Recognition (CVPR), 2023
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
179
55
0
11 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Computer Vision and Pattern Recognition (CVPR), 2023
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
465
377
0
11 Nov 2023
Analyzing Modular Approaches for Visual Question Decomposition
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Apoorv Khandelwal
Ellie Pavlick
Chen Sun
242
5
0
10 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
357
368
0
10 Nov 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
147
26
0
08 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
171
4
0
08 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Computer Vision and Pattern Recognition (CVPR), 2023
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
455
589
0
07 Nov 2023
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
International Conference on Learning Representations (ICLR), 2023
Jiayuan Gu
Sean Kirmani
Paul Wohlhart
Yao Lu
Montse Gonzalez Arenas
...
Hao Su
Karol Hausman
Chelsea Finn
Q. Vuong
Ted Xiao
207
111
0
03 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
162
45
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Information Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
381
70
0
01 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
251
29
0
31 Oct 2023
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yichi Zhang
Jiayi Pan
Yuchen Zhou
Rui Pan
Joyce Chai
VLM
208
26
0
31 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Neural Information Processing Systems (NeurIPS), 2023
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
186
18
0
25 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
143
0
0
25 Oct 2023
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Mohammadreza Salehi
Mehrdad Farajtabar
Maxwell Horton
Fartash Faghri
Hadi Pouransari
Raviteja Vemulapalli
Oncel Tuzel
Ali Farhadi
Mohammad Rastegari
Sachin Mehta
CLIP
VLM
187
3
0
21 Oct 2023
Visual Grounding Helps Learn Word Meanings in Low-Data Regimes
Chengxu Zhuang
Evelina Fedorenko
Jacob Andreas
269
17
0
20 Oct 2023
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li
Qiaozi Gao
Michael Johnston
Xiaofeng Gao
Xuehai He
Suhaila Shakiah
Hangjie Shi
R. Ghanadan
William Y. Wang
LM&Ro
332
17
0
14 Oct 2023
PolyTask: Learning Unified Policies through Behavior Distillation
Siddhant Haldar
Lerrel Pinto
216
10
0
12 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
International Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
403
450
0
11 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
132
5
0
08 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Neural Information Processing Systems (NeurIPS), 2023
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
342
64
0
07 Oct 2023
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Xuan
Qingpei Guo
Ming Yang
Shiliang Zhang
MLLM
ObjD
221
52
0
01 Oct 2023
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
International Conference on Learning Representations (ICLR), 2023
Yulu Gan
Sungwoo Park
Alexander Schubert
Anthony Philippakis
Ahmed Alaa
VLM
268
30
0
30 Sep 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
IEEE International Conference on Computer Vision (ICCV), 2023
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
257
4
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
International Conference on Learning Representations (ICLR), 2023
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
282
271
0
20 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
541
167
0
20 Sep 2023
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Ran Liu
Ellen L. Zippi
Hadi Pouransari
Chris Sandino
Jingping Nie
Hanlin Goh
Erdrin Azemi
Ali Moin
299
16
0
12 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffM
VLM
269
156
0
07 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
497
1,548
0
24 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
IEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
149
8
0
18 Aug 2023
FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving
IEEE Transactions on Intelligent Vehicles (TIV), 2023
Zhonghua Yi
Haowen Shi
Kailun Yang
Zhijie Xu
Yaozu Ye
Ze Wang
Huajian Ni
Kaiwei Wang
3DPC
190
12
0
14 Aug 2023
Learning to Model the World with Language
International Conference on Machine Learning (ICML), 2023
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&Ro
SyDa
267
71
0
31 Jul 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
283
54
0
30 Jul 2023
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MA
MedIm
AI4MH
242
401
0
26 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible Expressions
Neural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
222
48
0
24 Jul 2023
UniFormaly: Towards Task-Agnostic Unified Framework for Visual Anomaly Detection
Pattern Recognition (Pattern Recogn.), 2023
Yujin Lee
Harin Lim
Seoyoon Jang
H. Yoon
215
11
0
24 Jul 2023
Multimodal LLMs for health grounded in individual-specific data
Anastasiya Belyaeva
J. Cosentino
F. Hormozdiari
Krish Eswaran
S. Shetty
Greg C. Corrado
Andrew Carroll
Cory Y. McLean
N. Furlotte
LM&MA
225
78
0
18 Jul 2023
PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese
International Conference on Multimedia Analysis and Pattern Recognition (ICMAPR), 2023
Nghia Hieu Nguyen
Kiet Van Nguyen
196
2
0
17 Jul 2023
Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks
Denis Coquenet
Clément Rambour
Emanuele Dalsasso
Nicolas Thome
MLLM
CLIP
VLM
131
3
0
13 Jul 2023
Objaverse-XL: A Universe of 10M+ 3D Objects
Neural Information Processing Systems (NeurIPS), 2023
Matt Deitke
Ruoshi Liu
Matthew Wallingford
Huong Ngo
Oscar Michel
...
Carl Vondrick
Georgia Gkioxari
Kiana Ehsani
Ludwig Schmidt
Ali Farhadi
256
630
0
11 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
337
155
0
11 Jul 2023
Building Cooperative Embodied Agents Modularly with Large Language Models
International Conference on Learning Representations (ICLR), 2023
Hongxin Zhang
Weihua Du
Jiaming Shan
Qinhong Zhou
Yilun Du
J. Tenenbaum
Tianmin Shu
Chuang Gan
LLMAG
LM&Ro
522
259
0
05 Jul 2023
Visual Instruction Tuning with Polite Flamingo
AAAI Conference on Artificial Intelligence (AAAI), 2023
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
344
52
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Neural Information Processing Systems (NeurIPS), 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Jiaming Song
328
162
0
03 Jul 2023
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
Z. Chen
Mingyu Ding
Songlin Yang
Wei Zhan
Masayoshi Tomizuka
Erik Learned-Miller
Chuang Gan
MoE
123
8
0
29 Jun 2023
Previous
1
2
3
4
5
6
7
8
Next