Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.11441
Cited By
v1
v2 (latest)
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
17 October 2023
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (28 upvotes)
Github (1387★)
Papers citing
"Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V"
16 / 166 papers shown
Title
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Computer Vision and Pattern Recognition (CVPR), 2024
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
562
28
0
07 Jun 2024
Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks
Simranjit Singh
Georgios Pavlakos
Dimitrios Stamoulis
228
10
0
29 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
613
31
0
27 May 2024
Automating the Enterprise with Foundation Models
Proceedings of the VLDB Endowment (PVLDB), 2024
Michael Wornow
A. Narayan
Krista Opsahl-Ong
Quinn McIntyre
Nigam H. Shah
Christopher Ré
AI4CE
149
18
0
03 May 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
268
24
0
25 Apr 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Dongyoon Hahm
Kimin Lee
Changyeon Kim
Kimin Lee
292
29
0
25 Apr 2024
Large Language Models for Orchestrating Bimanual Robots
IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2024
Kun-Mo Chu
Xufeng Zhao
C. Weber
Mengdi Li
Wenhao Lu
Stefan Wermter
LM&Ro
LLMAG
247
12
0
02 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
343
84
0
29 Mar 2024
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
International Symposium on Mixed and Augmented Reality (ISMAR), 2024
Changyang Li
Qingan Yan
Minyoung Kim
Z. Li
Yi Tian Xu
Lap-Fai Yu
143
0
0
15 Mar 2024
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
Sizhe Yang
Qian Luo
Anumpam Pani
Yanchao Yang
183
4
0
13 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
171
177
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Jiaming Song
Yu Qiao
Shiyang Feng
MLLM
449
135
0
08 Feb 2024
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Computer Vision and Pattern Recognition (CVPR), 2023
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLM
LRM
MLLM
301
146
0
01 Dec 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
IEEE Transactions on Intelligent Vehicles (TIV), 2023
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
269
5
0
07 Oct 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
International Conference on Machine Learning (ICML), 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
474
1,008
0
04 Aug 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
394
17
0
03 Mar 2023
Previous
1
2
3
4