Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.11175
Cited By
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
18 May 2023
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
Gang Zeng
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks"
30 / 80 papers shown
Title
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
K. Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
34
12
0
25 Apr 2024
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
Avinash Anand
Janak Kapuriya
Chhavi Kirtani
Apoorv Singh
Jay Saraf
Naman Lal
Jatin Kumar
A. Shivam
Astha Verma
R. Shah
OffRL
40
9
0
19 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
58
32
0
29 Mar 2024
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li
Qiang Nie
Weifu Fu
Yuhuan Lin
Guangpin Tao
Yong-Jin Liu
Chengjie Wang
25
4
0
07 Mar 2024
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
Maksim Kuprashevich
Grigorii Alekseenko
Irina Tolstykh
ELM
48
4
0
04 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
42
7
0
29 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
126
106
0
08 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
Senqiao Yang
Tianyuan Qu
Xin Lai
Zhuotao Tian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
VLM
21
28
0
28 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
26
3
0
05 Dec 2023
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
90
10
0
04 Dec 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Hongsheng Li
MLLM
25
8
0
30 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
24
12
0
09 Nov 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
39
46
0
01 Sep 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
22
3
0
19 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
83
223
0
07 Jul 2023
UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering
Mingjie Pan
Li Liu
Jiaming Liu
Peixiang Huang
Longlong Wang
Shanghang Zhang
Shaoqing Xu
Zhiyi Lai
Kuiyuan Yang
14
20
0
15 Jun 2023
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
Chenxu Hu
Jie Fu
Chenzhuang Du
Simian Luo
J. Zhao
Hang Zhao
KELM
LLMAG
22
104
0
06 Jun 2023
Interactive Segment Anything NeRF with Feature Imitation
Xiaokang Chen
Jiaxiang Tang
Diwen Wan
Jingbo Wang
Gang Zeng
29
22
0
25 May 2023
Uncovering and Quantifying Social Biases in Code Generation
Y. Liu
Xiaokang Chen
Yan Gao
Zhe Su
Fengji Zhang
Daoguang Zan
Jian-Guang Lou
Pin-Yu Chen
Tsung-Yi Ho
30
19
0
24 May 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
242
1,070
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
194
220
0
24 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
344
0
22 Sep 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,978
0
09 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
Previous
1
2