ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11441
  4. Cited By
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
v1v2 (latest)

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

17 October 2023
Jianwei Yang
Hao Zhang
Feng Li
Xueyan Zou
Chun-yue Li
Jianfeng Gao
    MLLMVLM
ArXiv (abs)PDFHTMLHuggingFace (28 upvotes)Github (1387★)

Papers citing "Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V"

16 / 166 papers shown
Title
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less HallucinationComputer Vision and Pattern Recognition (CVPR), 2024
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
562
28
0
07 Jun 2024
Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering
  Benchmarks
Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks
Simranjit Singh
Georgios Pavlakos
Dimitrios Stamoulis
228
10
0
29 May 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRMMLLM
613
31
0
27 May 2024
Automating the Enterprise with Foundation Models
Automating the Enterprise with Foundation ModelsProceedings of the VLDB Endowment (PVLDB), 2024
Michael Wornow
A. Narayan
Krista Opsahl-Ong
Quinn McIntyre
Nigam H. Shah
Christopher Ré
AI4CE
149
18
0
03 May 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
268
24
0
25 Apr 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee
Taywon Min
Minyong An
Dongyoon Hahm
Kimin Lee
Changyeon Kim
Kimin Lee
292
29
0
25 Apr 2024
Large Language Models for Orchestrating Bimanual Robots
Large Language Models for Orchestrating Bimanual RobotsIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2024
Kun-Mo Chu
Xufeng Zhao
C. Weber
Mengdi Li
Wenhao Lu
Stefan Wermter
LM&RoLLMAG
247
12
0
02 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
343
84
0
29 Mar 2024
Crafting Dynamic Virtual Activities with Advanced Multimodal Models
Crafting Dynamic Virtual Activities with Advanced Multimodal ModelsInternational Symposium on Mixed and Augmented Reality (ISMAR), 2024
Changyang Li
Qingan Yan
Minyoung Kim
Z. Li
Yi Tian Xu
Lap-Fai Yu
143
0
0
15 Mar 2024
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
Sizhe Yang
Qian Luo
Anumpam Pani
Yanchao Yang
183
4
0
13 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&RoLRM
171
177
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Jiaming Song
Yu Qiao
Shiyang Feng
MLLM
449
135
0
08 Feb 2024
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual
  Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsComputer Vision and Pattern Recognition (CVPR), 2023
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLMLRMMLLM
301
146
0
01 Dec 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Exploring the Potential of Multi-Modal AI for Driving Hazard PredictionIEEE Transactions on Intelligent Vehicles (TIV), 2023
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
269
5
0
07 Oct 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesInternational Conference on Machine Learning (ICML), 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
474
1,008
0
04 Aug 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question AnsweringIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
394
17
0
03 Mar 2023
Previous
1234