ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.13438
  4. Cited By
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language
  Models through Prompting and Interacting 3D Priors

SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors

18 March 2024
Chenyang Ma
Kai Lu
Ta-Ying Cheng
Niki Trigoni
Andrew Markham
    LRM
ArXivPDFHTML

Papers citing "SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors"

11 / 11 papers shown
Title
CoPa: General Robotic Manipulation through Spatial Constraints of Parts
  with Foundation Models
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Haoxu Huang
Fanqi Lin
Yingdong Hu
Shengjie Wang
Yang Gao
29
49
0
13 Mar 2024
MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual
  Prompting
MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting
Fangchen Liu
Kuan Fang
Pieter Abbeel
Sergey Levine
LM&Ro
40
23
0
05 Mar 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
139
706
0
19 Jan 2024
Self-correcting LLM-controlled Diffusion Models
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu
Long Lian
Joseph E. Gonzalez
Boyi Li
Trevor Darrell
62
53
0
27 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
194
587
0
16 Nov 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement
  Learning
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
VLM
31
74
0
19 Oct 2023
Gradient-less Federated Gradient Boosting Trees with Learnable Learning
  Rates
Gradient-less Federated Gradient Boosting Trees with Learnable Learning Rates
Chenyang Ma
Xinchi Qiu
Daniel J. Beutel
Nicholas D. Lane
FedML
16
12
0
15 Apr 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
317
8,448
0
28 Jan 2022
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
344
0
22 Sep 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
C. Karen Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
138
155
0
06 Aug 2021
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image
  Residuals in Synthetic Domains
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Bowen Wen
Chaitanya Mitash
Baozhang Ren
Kostas E. Bekris
77
127
0
27 Jul 2020
1