Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.18937
Cited By
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
29 May 2024
Junjie Fei
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding"
8 / 8 papers shown
Title
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
71
35
0
05 Dec 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
UniG3D: A Unified 3D Object Generation Dataset
Qinghong Sun
Yangguang Li
Zexia Liu
Xiaoshui Huang
Fenggang Liu
Xihui Liu
Wanli Ouyang
Jing Shao
22
6
0
19 Jun 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
259
4,223
0
30 Jan 2023
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
Jonas Schult
Francis Engelmann
Alexander Hermans
Or Litany
Siyu Tang
Bastian Leibe
ISeg
50
164
0
06 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
344
0
22 Sep 2021
1