Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.09530
Cited By
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
14 March 2024
Chris Kelly
Luhui Hu
Jiayin Hu
Yu Tian
Deshun Yang
Bang Yang
Cindy Yang
Zihao Li
Zaoshan Huang
Yuexian Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding"
4 / 4 papers shown
Title
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Deshun Yang
Luhui Hu
Yu Tian
Zihao Li
Chris Kelly
Bang Yang
Cindy Yang
Yuexian Zou
VGen
30
12
0
10 Mar 2024
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Junshu Tang
Tengfei Wang
Bo Zhang
Ting Zhang
Ran Yi
Lizhuang Ma
Dong Chen
DiffM
187
307
0
24 Mar 2023
MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
Jiale Li
Hang Dai
Hao Han
Yong Ding
3DPC
35
68
0
15 Mar 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
1