Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.11289
Cited By
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
17 March 2024
Siyuan Huang
Iaroslav Ponomarenko
Zhengkai Jiang
Xiaoqi Li
Xiaobin Hu
Peng Gao
Hongsheng Li
Hao Dong
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models"
11 / 11 papers shown
Title
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
M. Zhang
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
S. Zhang
Hao Dong
LM&Ro
31
0
0
04 May 2025
3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment
X. Li
J. H. Liu
Nuowei Han
Liang Heng
Y. Guo
Hao Dong
Yang Liu
37
0
0
03 May 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu
J. Zhang
Minghao Guo
Youpeng Wen
H. Yang
...
Liqiong Wang
Yuxuan Kuang
Meng Cao
Feng Zheng
Xiaodan Liang
37
1
0
17 Apr 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
S. Zhang
58
5
0
13 Mar 2025
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Qiaojun Yu
Siyuan Huang
Xibin Yuan
Zhengkai Jiang
Ce Hao
...
Junbo Wang
Liu Liu
Hongsheng Li
Peng Gao
Cewu Lu
49
3
0
30 Sep 2024
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
Houjian Yu
Mingen Li
Alireza Rezazadeh
Yang Yang
Changhyun Choi
30
1
0
28 Sep 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
141
449
0
12 Sep 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects
Kaichun Mo
Leonidas J. Guibas
Mustafa Mukadam
Abhinav Gupta
Shubham Tulsiani
143
175
0
07 Jan 2021
SAPIEN: A SimulAted Part-based Interactive ENvironment
Fanbo Xiang
Yuzhe Qin
Kaichun Mo
Yikuan Xia
Hao Zhu
...
He-Nan Wang
Li Yi
Angel X. Chang
Leonidas J. Guibas
Hao Su
195
482
0
19 Mar 2020
1