ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.13681
  4. Cited By
Point and Ask: Incorporating Pointing into Visual Question Answering
v1v2v3v4 (latest)

Point and Ask: Incorporating Pointing into Visual Question Answering

27 November 2020
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
    3DPC
ArXiv (abs)PDFHTMLGithub (19★)

Papers citing "Point and Ask: Incorporating Pointing into Visual Question Answering"

18 / 18 papers shown
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
495
4
0
12 Sep 2025
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Honglu Zhou
Xiangyu Peng
Shrikant B. Kendre
Michael S Ryoo
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
186
2
0
03 Sep 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and AssessmentNeural Information Processing Systems (NeurIPS), 2024
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
454
17
0
21 Feb 2025
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMsInternational Conference on Learning Representations (ICLR), 2024
Hong Li
Nanxi Li
Yuanjie Chen
Jianbin Zhu
Qinlu Guo
Cewu Lu
Yong-Lu Li
MLLM
424
3
0
02 Oct 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for
  Remote Sensing Vision-Language Understanding
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
427
83
0
14 Jun 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan
Zhengyuan Yang
Junda Wu
Wanrong Zhu
Jianwei Yang
...
Kevin Qinghong Lin
Jianfeng Wang
Julian McAuley
Jianfeng Gao
Lijuan Wang
LRM
342
27
0
25 Apr 2024
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal
  Large Language Models
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Ya-Qi Yu
Minghui Liao
Jihao Wu
Yongxin Liao
Xiaoyu Zheng
Wei Zeng
VLM
244
29
0
14 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
519
97
0
29 Mar 2024
Visual Instruction Tuning towards General-Purpose Multimodal Model: A
  Survey
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
234
34
0
27 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLMMLLM
533
59
0
19 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data
  Generator
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLMSyDa
567
13
0
11 Dec 2023
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual
  Prompts
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual PromptsComputer Vision and Pattern Recognition (CVPR), 2023
Mu Cai
Haotian Liu
Dennis Park
Siva Karthik Mustikovela
Gregory P. Meyer
Yuning Chai
Yong Jae Lee
VLMLRMMLLM
404
175
0
01 Dec 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang
Yuan Yao
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
MLLMVLM
447
81
0
08 Nov 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
476
93
0
13 Oct 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring
  Instruction Tuning
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction TuningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLMLRM
224
71
0
18 Jul 2023
Localized Questions in Medical Visual Question Answering
Localized Questions in Medical Visual Question AnsweringInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
210
10
0
03 Jul 2023
AssistSR: Task-oriented Video Segment Retrieval for Personal AI
  Assistant
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
357
8
0
30 Nov 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural NetsIEEE Access (IEEE Access), 2021
Gargi Joshi
Rahee Walambe
K. Kotecha
508
181
0
17 May 2021
1
Page 1 of 1