Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2206.08916
Cited By
v1
v2 (latest)
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
International Conference on Learning Representations (ICLR), 2022
17 June 2022
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks"
50 / 352 papers shown
Title
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Chengxu Zhuang
Evelina Fedorenko
Jacob Andreas
164
4
0
21 Mar 2024
What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models
Junho Kim
Yeonju Kim
Yonghyun Ro
LRM
MLLM
195
9
0
20 Mar 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
203
13
0
20 Mar 2024
A Versatile Framework for Multi-scene Person Re-identification
Wei-Shi Zheng
Junkai Yan
Yi-Xing Peng
VLM
316
14
0
17 Mar 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
International Conference on Machine Learning (ICML), 2024
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
255
213
0
14 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
European Conference on Computer Vision (ECCV), 2024
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Jiaming Song
Bernt Schiele
Liwei Wang
VLM
262
22
0
14 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
355
12
0
14 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Computer Vision and Pattern Recognition (CVPR), 2024
Han Qiu
Jiaxing Huang
Shiyang Feng
Lewei Lu
Xiaoqin Zhang
Shijian Lu
187
5
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Neural Information Processing Systems (NeurIPS), 2024
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
265
25
0
12 Mar 2024
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Computer Vision and Pattern Recognition (CVPR), 2024
Jiawen Zhu
Guansong Pang
VLM
364
79
0
11 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
390
15
0
05 Mar 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah
Tarkan Aydin
408
0
0
04 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
319
4
0
04 Mar 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
359
74
0
26 Feb 2024
Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions
Tzuf Paz-Argaman
Sayali Kulkarni
John Palowitch
Jason Baldridge
Reut Tsarfaty
150
4
0
26 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
185
14
0
13 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
269
92
0
08 Feb 2024
Data-efficient Large Vision Models through Sequential Autoregression
Jianyuan Guo
Zhiwei Hao
Chengcheng Wang
Yehui Tang
Han Wu
Han Hu
Kai Han
Chang Xu
VLM
232
12
0
07 Feb 2024
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
507
123
0
02 Feb 2024
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
169
12
0
31 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
411
9
0
19 Jan 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLM
MLLM
ViT
294
103
0
18 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
328
15
0
18 Jan 2024
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents
Yuanzhi Liang
Linchao Zhu
Yi Yang
LLMAG
196
0
0
12 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
224
6
0
06 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
286
12
0
03 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
271
28
0
31 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
275
267
0
28 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
269
26
0
25 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
374
50
0
19 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
368
21
0
15 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
177
46
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
324
76
0
14 Dec 2023
Tokenize Anything via Prompting
European Conference on Computer Vision (ECCV), 2023
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
223
35
0
14 Dec 2023
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Computer Vision and Pattern Recognition (CVPR), 2023
Chaoya Jiang
Haiyang Xu
Mengfan Dong
Jiaxing Chen
Wei Ye
Mingshi Yan
Qinghao Ye
Ji Zhang
Fei Huang
Shikun Zhang
VLM
294
111
0
12 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
262
106
0
11 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
426
11
0
11 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
309
73
0
05 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
International Conference on Machine Learning (ICML), 2023
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
337
13
0
05 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo Zhang
Xiangxiang Chu
MLLM
LRM
254
30
0
05 Dec 2023
GIVT: Generative Infinite-Vocabulary Transformers
European Conference on Computer Vision (ECCV), 2023
Michael Tschannen
Cian Eastwood
Fabian Mentzer
346
63
0
04 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
Computer Vision and Pattern Recognition (CVPR), 2023
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLM
MLLM
LRM
369
186
0
04 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yizhou Wang
YiXuan Wu
Weizhen He
Xun Guo
Xun Guo
...
Mengwei He
Rui Zhao
Jian Wu
Tong He
Bin Wang
VLM
685
20
0
04 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Computer Vision and Pattern Recognition (CVPR), 2023
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
230
33
0
01 Dec 2023
Manipulating the Label Space for In-Context Classification
Haokun Chen
Xu Yang
Yuhang Huang
Zihan Wu
Jing Wang
Xin Geng
VLM
188
4
0
01 Dec 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Jiaming Song
MLLM
196
16
0
30 Nov 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
260
69
0
30 Nov 2023
Do text-free diffusion models learn discriminative visual representations?
European Conference on Computer Vision (ECCV), 2023
Soumik Mukhopadhyay
M. Gwilliam
Yosuke Yamaguchi
Vatsal Agarwal
Namitha Padmanabhan
Archana Swaminathan
Wanrong Zhu
Abhinav Shrivastava
DiffM
398
24
1
29 Nov 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
IEEE transactions on multimedia (IEEE TMM), 2023
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
413
40
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
380
1
0
28 Nov 2023
Previous
1
2
3
4
5
6
7
8
Next