Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04408
Cited By
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
8 December 2022
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
Yichang Zhang
Xiaohuan Zhou
Peng Wang
Sinan Tan
An Yang
Zeyu Cui
Yu Han
Shuai Bai
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models"
21 / 21 papers shown
Title
Surprisingly Fragile: Assessing and Addressing Prompt Instability in Multimodal Foundation Models
Ian Stewart
Sameera Horawalavithana
Brendan Kennedy
Sai Munikoti
Karl Pazdernik
AAML
16
2
0
26 Aug 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
35
2
0
08 Jun 2024
A Versatile Framework for Multi-scene Person Re-identification
Wei-Shi Zheng
Junkai Yan
Yi-Xing Peng
VLM
27
5
0
17 Mar 2024
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
33
7
0
11 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
25
81
0
06 Dec 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
9
65
0
23 Oct 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
29
1,559
0
28 Sep 2023
TouchStone: Evaluating Vision-Language Models by Language Models
Shuai Bai
Shusheng Yang
Jinze Bai
Peng Wang
Xing Zhang
Junyang Lin
Xinggang Wang
Chang Zhou
Jingren Zhou
MLLM
26
44
0
31 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
29
780
0
24 Aug 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
Leveraging sparse and shared feature activations for disentangled representation learning
Marco Fumero
F. Wenzel
L. Zancato
Alessandro Achille
Emanuele Rodolà
Stefano Soatto
Bernhard Schölkopf
Francesco Locatello
OOD
DRL
27
21
0
17 Apr 2023
A Survey on Table Question Answering: Recent Advances
Nengzheng Jin
Joanna Siebert
Dongfang Li
Qingcai Chen
LMTD
42
28
0
12 Jul 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
110
192
0
14 Oct 2021
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
198
467
0
21 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
The KIT Motion-Language Dataset
Matthias Plappert
Christian Mandery
Tamim Asfour
174
267
0
13 Jul 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1