ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.04352
  4. Cited By
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment

8 August 2023
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
    LM&Ro
ArXivPDFHTML

Papers citing "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"

16 / 16 papers shown
Title
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
40
0
0
08 May 2025
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
Nader Zantout
Haochen Zhang
Pujith Kachana
J. Qiu
Ji Zhang
Wenshan Wang
LM&Ro
LRM
56
0
0
25 Apr 2025
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang
Leonor Fermoselle
Duy Ta
Bernadette Bucher
Luca Carlone
Jiuguang Wang
30
0
0
09 Apr 2025
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DV
LRM
59
1
0
08 Mar 2025
CrossOver: 3D Scene Cross-Modal Alignment
CrossOver: 3D Scene Cross-Modal Alignment
S. Sarkar
O. Mikšík
Marc Pollefeys
Daniel Barath
Iro Armeni
3DPC
71
0
0
20 Feb 2025
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and
  Open-Vocabulary Semantic Scene Graphs
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
P. Krishnamurthy
Farshad Khorrami
LM&Ro
30
3
0
08 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
93
29
0
26 Sep 2024
QueryCAD: Grounded Question Answering for CAD Models
QueryCAD: Grounded Question Answering for CAD Models
Claudius Kienle
Benjamin Alt
Darko Katic
Rainer Jäkel
Jan Peters
16
1
0
13 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
65
11
0
07 Jun 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Hongyuan Zhu
Fukun Yin
Gang Yu
Tao Chen
17
23
0
17 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
56
4
0
15 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
26
3
0
05 Dec 2023
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Hongyuan Zhu
Jiayuan Fan
Tao Chen
MLLM
24
76
0
30 Nov 2023
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
147
179
0
01 Oct 2021
1