Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2206.08916
Cited By
v1
v2 (latest)
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
International Conference on Learning Representations (ICLR), 2022
17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"
50 / 352 papers shown
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Chengxu Zhuang
Evelina Fedorenko
Jacob Andreas
188
5
0
21 Mar 2024
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
Junho Kim
Yeonju Kim
Yonghyun Ro
LRM
MLLM
211
9
0
20 Mar 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
216
13
0
20 Mar 2024
A Versatile Framework for Multi-scene Person Re-identification
Wei-Shi Zheng
Junkai Yan
Yi-Xing Peng
VLM
327
17
0
17 Mar 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
International Conference on Machine Learning (ICML), 2024
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
272
219
0
14 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
European Conference on Computer Vision (ECCV), 2024
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Jiaming Song
Bernt Schiele
Liwei Wang
VLM
280
22
0
14 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
383
13
0
14 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Computer Vision and Pattern Recognition (CVPR), 2024
Han Qiu
Jiaxing Huang
Shiyang Feng
Lewei Lu
Xiaoqin Zhang
Shijian Lu
217
5
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Neural Information Processing Systems (NeurIPS), 2024
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
294
26
0
12 Mar 2024
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Computer Vision and Pattern Recognition (CVPR), 2024
Jiawen Zhu
Guansong Pang
VLM
418
83
0
11 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
412
15
0
05 Mar 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah
Tarkan Aydin
424
0
0
04 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
335
4
0
04 Mar 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
376
75
0
26 Feb 2024
Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
Tzuf Paz-Argaman
Sayali Kulkarni
John Palowitch
Jason Baldridge
Reut Tsarfaty
158
4
0
26 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
206
14
0
13 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
286
96
0
08 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
248
12
0
07 Feb 2024
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
527
128
0
02 Feb 2024
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
189
12
0
31 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
443
9
0
19 Jan 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLM
MLLM
ViT
311
106
0
18 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
347
15
0
18 Jan 2024
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents
Yuanzhi Liang
Linchao Zhu
Yi Yang
LLMAG
220
1
0
12 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
277
6
0
06 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
318
13
0
03 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
300
28
0
31 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
282
274
0
28 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
273
26
0
25 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
386
50
0
19 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
414
21
0
15 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
191
46
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
343
79
0
14 Dec 2023
Tokenize Anything via Prompting
European Conference on Computer Vision (ECCV), 2023
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
257
35
0
14 Dec 2023
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Computer Vision and Pattern Recognition (CVPR), 2023
Chaoya Jiang
Haiyang Xu
Mengfan Dong
Jiaxing Chen
Wei Ye
Mingshi Yan
Qinghao Ye
Ji Zhang
Fei Huang
Shikun Zhang
VLM
321
116
0
12 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
270
107
0
11 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
461
12
0
11 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
348
74
0
05 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
International Conference on Machine Learning (ICML), 2023
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
341
13
0
05 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo Zhang
Xiangxiang Chu
MLLM
LRM
267
33
0
05 Dec 2023
GIVT: Generative Infinite-Vocabulary Transformers
European Conference on Computer Vision (ECCV), 2023
Michael Tschannen
Cian Eastwood
Fabian Mentzer
369
63
0
04 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
Computer Vision and Pattern Recognition (CVPR), 2023
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLM
MLLM
LRM
377
189
0
04 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yizhou Wang
YiXuan Wu
Weizhen He
Xun Guo
Xun Guo
...
Mengwei He
Rui Zhao
Jian Wu
Tong He
Bin Wang
VLM
714
21
0
04 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Computer Vision and Pattern Recognition (CVPR), 2023
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
259
33
0
01 Dec 2023
Manipulating the Label Space for In-Context Classification
Haokun Chen
Xu Yang
Yuhang Huang
Zihan Wu
Jing Wang
Xin Geng
VLM
214
4
0
01 Dec 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Jiaming Song
MLLM
213
16
0
30 Nov 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
276
69
0
30 Nov 2023
Do text-free diffusion models learn discriminative visual representations?
European Conference on Computer Vision (ECCV), 2023
Soumik Mukhopadhyay
M. Gwilliam
Yosuke Yamaguchi
Vatsal Agarwal
Namitha Padmanabhan
Archana Swaminathan
Wanrong Zhu
Abhinav Shrivastava
DiffM
411
26
1
29 Nov 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
IEEE transactions on multimedia (IEEE TMM), 2023
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
457
40
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
392
1
0
28 Nov 2023
Previous
1
2
3
4
5
6
7
8
Next
Page 4 of 8