ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.09530
  4. Cited By
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision
  Understanding

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

14 March 2024
Chris Kelly
Luhui Hu
Jiayin Hu
Yu Tian
Deshun Yang
Bang Yang
Cindy Yang
Zihao Li
Zaoshan Huang
Yuexian Zou
ArXivPDFHTML

Papers citing "VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding"

4 / 4 papers shown
Title
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text
  and Image Inputs
WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs
Deshun Yang
Luhui Hu
Yu Tian
Zihao Li
Chris Kelly
Bang Yang
Cindy Yang
Yuexian Zou
VGen
33
12
0
10 Mar 2024
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion
  Prior
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Junshu Tang
Tengfei Wang
Bo Zhang
Ting Zhang
Ran Yi
Lizhuang Ma
Dong Chen
DiffM
187
307
0
24 Mar 2023
MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
Jiale Li
Hang Dai
Hao Han
Yong Ding
3DPC
35
68
0
15 Mar 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
1