ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08481
  4. Cited By
GuessWhat?! Visual object discovery through multi-modal dialogue
v1v2 (latest)

GuessWhat?! Visual object discovery through multi-modal dialogue

23 November 2016
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
    VLM
ArXiv (abs)PDFHTML

Papers citing "GuessWhat?! Visual object discovery through multi-modal dialogue"

50 / 237 papers shown
Title
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities
Harsh Nishant Lalai
Raj Sanjay Shah
Jiaxin Pei
Sashank Varma
Yi-Chia Wang
Ali Emami
LRM
97
0
0
07 Aug 2025
OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration
OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration
Junwen Duan
Wei Xue
Ziyao Kang
Shixia Liu
Jiazhi Xia
VLM
123
0
0
26 Jul 2025
MDC-R: The Minecraft Dialogue Corpus with Reference
MDC-R: The Minecraft Dialogue Corpus with Reference
Chris Madge
Maris Camilleri
Paloma Carretero García
Vanja Karan
Juexi Shao
Prashant Jayannavar
Julian Hough
Benjamin Roth
Massimo Poesio
95
2
0
27 Jun 2025
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
Dota Tianai Dong
Yifan Luo
Po-Ya Angela Wang
Asli Ozyurek
Paula Rubio-Fernandez
150
0
0
29 May 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
855
12
0
11 Mar 2025
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and ChallengesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Bolei Ma
Yuting Li
Wei Zhou
Ziwei Gong
Wenshu Fan
Katja Jasinskaja
Annemarie Friedrich
Julia Hirschberg
Frauke Kreuter
Barbara Plank
ELMLRM
333
14
0
17 Feb 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
859
27
0
28 Dec 2024
Multi-Modal Dialogue State Tracking for Playing GuessWhich Game
Multi-Modal Dialogue State Tracking for Playing GuessWhich GameCAAI International Conference on Artificial Intelligence (ICCAI), 2024
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
119
0
0
15 Aug 2024
Enhancing Visual Dialog State Tracking through Iterative Object-Entity
  Alignment in Multi-Round Conversations
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
104
0
0
13 Aug 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
167
7
0
10 Jul 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
229
31
0
24 Jun 2024
ChatShop: Interactive Information Seeking with Language Agents
ChatShop: Interactive Information Seeking with Language Agents
Sanxing Chen
Sam Wiseman
Bhuwan Dhingra
KELM
301
17
0
15 Apr 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLMLRM
347
26
0
07 Mar 2024
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially
  Observable Environments
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments
Savitha Sam Abraham
Marjan Alirezaie
Luc de Raedt
232
1
0
05 Mar 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLMVLM
319
74
0
26 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
263
5
0
19 Feb 2024
Improving Agent Interactions in Virtual Environments with Language
  Models
Improving Agent Interactions in Virtual Environments with Language Models
Jack Zhang
LLMAG
156
0
0
08 Feb 2024
Towards Unified Interactive Visual Grounding in The Wild
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
246
5
0
30 Jan 2024
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language GroundingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
137
2
0
12 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in ClutterConference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
195
37
0
09 Nov 2023
Context Does Matter: End-to-end Panoptic Narrative Grounding with
  Deformable Attention Refined Matching Network
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching NetworkIndustrial Conference on Data Mining (IDM), 2023
Yiming Lin
Xiao-Bo Jin
Qiufeng Wang
Kaizhu Huang
137
5
0
25 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
157
1
0
18 Oct 2023
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question
  Games
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question GamesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yizhe Zhang
Jiarui Lu
Navdeep Jaitly
LRMELM
274
21
0
02 Oct 2023
Resolving References in Visually-Grounded Dialogue via Text Generation
Resolving References in Visually-Grounded Dialogue via Text GenerationSIGDIAL Conferences (SIGDIAL), 2023
Bram Willemsen
Livia Qian
Gabriel Skantze
144
5
0
23 Sep 2023
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue
Ryosuke Oshima
Seitaro Shinagawa
Hideki Tsunashima
Qi Feng
Shigeo Morishima
162
4
0
19 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
PROGrasp: Pragmatic Human-Robot Communication for Object GraspingIEEE International Conference on Robotics and Automation (ICRA), 2023
Gi-Cheon Kang
Junghyun Kim
Suhyung Choi
Byoung-Tak Zhang
370
7
0
14 Sep 2023
Collecting Visually-Grounded Dialogue with A Game Of Sorts
Collecting Visually-Grounded Dialogue with A Game Of SortsInternational Conference on Language Resources and Evaluation (LREC), 2023
Bram Willemsen
Dmytro Kalpakchi
Gabriel Skantze
81
2
0
10 Sep 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded ConversationsEuropean Conference on Computer Vision (ECCV), 2023
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
211
7
0
30 Aug 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
407
22
0
28 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
238
55
0
01 Aug 2023
'What are you referring to?' Evaluating the Ability of Multi-Modal
  Dialogue Models to Process Clarificational Exchanges
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesSIGDIAL Conferences (SIGDIAL), 2023
Javier Chiyah-Garcia
Alessandro Suglia
Arash Eshghi
Helen F. Hastie
155
6
0
28 Jul 2023
Learning to Generate Equitable Text in Dialogue from Biased Training
  Data
Learning to Generate Equitable Text in Dialogue from Biased Training DataAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Anthony Sicilia
Malihe Alikhani
267
20
0
10 Jul 2023
Solving Dialogue Grounding Embodied Task in a Simulated Environment
  using Further Masked Language Modeling
Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling
Weijie Zhang
153
0
0
21 Jun 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as
  Implicit Reference Chain
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference ChainAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Shih-Lun Wu
Yi-Hui Chou
Liang Li
134
0
0
16 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLPAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sandro Pezzelle
144
11
0
08 Jun 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic
  Understanding with Scene and Topic Transitions
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic TransitionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
169
14
0
30 May 2023
A Unified Framework for Slot based Response Generation in a Multimodal
  Dialogue System
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
272
9
0
27 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in
  Open-domain Dialogue
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
145
5
0
23 May 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with
  Multi-modal Visual Data and Natural Language
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural LanguageEuropean Conference on Computer Vision (ECCV), 2023
Zhe Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
234
12
0
12 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
151
12
0
23 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
208
122
0
12 Mar 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
  World
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real WorldACM Multimedia (ACM MM), 2023
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
165
13
0
14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions
  from Incremental Layout Graph
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout GraphAAAI Conference on Artificial Intelligence (AAAI), 2023
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
175
8
0
05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
157
18
0
26 Dec 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learningArtificial Intelligence Review (Artif Intell Rev), 2022
Maria Lymperaiou
Giorgos Stamou
433
21
0
19 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog SystemConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
121
3
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling ApproachesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
183
9
0
15 Nov 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
126
9
0
24 Oct 2022
Are Current Decoding Strategies Capable of Facing the Challenges of
  Visual Dialogue?
Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?International Conference on Natural Language Generation (INLG), 2022
Amit Kumar Chaudhary
Alex J. Lucassen
Ioanna Tsani
A. Testoni
143
1
0
24 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
217
175
0
23 Oct 2022
12345
Next