ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08481
  4. Cited By
GuessWhat?! Visual object discovery through multi-modal dialogue

GuessWhat?! Visual object discovery through multi-modal dialogue

23 November 2016
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
    VLM
ArXivPDFHTML

Papers citing "GuessWhat?! Visual object discovery through multi-modal dialogue"

50 / 232 papers shown
Title
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
148
0
0
11 Mar 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
Multi-Modal Dialogue State Tracking for Playing GuessWhich Game
Multi-Modal Dialogue State Tracking for Playing GuessWhich Game
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
33
0
0
15 Aug 2024
Enhancing Visual Dialog State Tracking through Iterative Object-Entity
  Alignment in Multi-Round Conversations
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
19
0
0
13 Aug 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
33
2
0
10 Jul 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
33
8
0
24 Jun 2024
ChatShop: Interactive Information Seeking with Language Agents
ChatShop: Interactive Information Seeking with Language Agents
Sanxing Chen
Sam Wiseman
Bhuwan Dhingra
KELM
26
7
0
15 Apr 2024
How Far Are We from Intelligent Visual Deductive Reasoning?
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang
Richard He Bai
Ruixiang Zhang
Jiatao Gu
Shuangfei Zhai
J. Susskind
Navdeep Jaitly
ReLM
LRM
44
13
0
07 Mar 2024
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially
  Observable Environments
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments
Savitha Sam Abraham
Marjan Alirezaie
Luc de Raedt
22
1
0
05 Mar 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
42
39
0
26 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
32
3
0
19 Feb 2024
Improving Agent Interactions in Virtual Environments with Language
  Models
Improving Agent Interactions in Virtual Environments with Language Models
Jack Zhang
LLMAG
24
0
0
08 Feb 2024
Towards Unified Interactive Visual Grounding in The Wild
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
30
5
0
30 Jan 2024
Which One? Leveraging Context Between Objects and Multiple Views for
  Language Grounding
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Chancharik Mitra
Abrar Anwar
Rodolfo Corona
Dan Klein
Trevor Darrell
Jesse Thomason
19
1
0
12 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
M. Kasaei
Zhibin Li
H. Kasaei
32
23
0
09 Nov 2023
Context Does Matter: End-to-end Panoptic Narrative Grounding with
  Deformable Attention Refined Matching Network
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network
Yiming Lin
Xiao-Bo Jin
Qiufeng Wang
Kaizhu Huang
24
3
0
25 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
17
1
0
18 Oct 2023
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question
  Games
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
Yizhe Zhang
Jiarui Lu
Navdeep Jaitly
LRM
ELM
16
9
0
02 Oct 2023
Resolving References in Visually-Grounded Dialogue via Text Generation
Resolving References in Visually-Grounded Dialogue via Text Generation
Bram Willemsen
Livia Qian
Gabriel Skantze
17
3
0
23 Sep 2023
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue
Ryosuke Oshima
Seitaro Shinagawa
Hideki Tsunashima
Qi Feng
Shigeo Morishima
27
3
0
19 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
Gi-Cheon Kang
Junghyun Kim
Jaein Kim
Byoung-Tak Zhang
19
4
0
14 Sep 2023
Collecting Visually-Grounded Dialogue with A Game Of Sorts
Collecting Visually-Grounded Dialogue with A Game Of Sorts
Bram Willemsen
Dmytro Kalpakchi
Gabriel Skantze
11
2
0
10 Sep 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
31
4
0
30 Aug 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
F. Liu
Yali Li
Shengjin Wang
33
28
0
01 Aug 2023
'What are you referring to?' Evaluating the Ability of Multi-Modal
  Dialogue Models to Process Clarificational Exchanges
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges
Javier Chiyah-Garcia
Alessandro Suglia
Arash Eshghi
Helen F. Hastie
24
6
0
28 Jul 2023
Learning to Generate Equitable Text in Dialogue from Biased Training
  Data
Learning to Generate Equitable Text in Dialogue from Biased Training Data
Anthony Sicilia
Malihe Alikhani
40
15
0
10 Jul 2023
Solving Dialogue Grounding Embodied Task in a Simulated Environment
  using Further Masked Language Modeling
Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling
Weijie Zhang
27
0
0
21 Jun 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as
  Implicit Reference Chain
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain
Shih-Lun Wu
Yi-Hui Chou
Liang Li
13
0
0
16 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
14
9
0
08 Jun 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic
  Understanding with Scene and Topic Transitions
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
24
9
0
30 May 2023
A Unified Framework for Slot based Response Generation in a Multimodal
  Dialogue System
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
38
7
0
27 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in
  Open-domain Dialogue
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
35
4
0
23 May 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with
  Multi-modal Visual Data and Natural Language
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhe-nan Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
82
4
0
12 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
46
6
0
23 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
27
96
0
12 Mar 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
  World
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
27
9
0
14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions
  from Incremental Layout Graph
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
25
7
0
05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
31
16
0
26 Dec 2022
A survey on knowledge-enhanced multimodal learning
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
35
13
0
19 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog System
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
35
2
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
19
6
0
15 Nov 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and Comprehension
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
27
6
0
24 Oct 2022
Are Current Decoding Strategies Capable of Facing the Challenges of
  Visual Dialogue?
Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?
Amit Kumar Chaudhary
Alex J. Lucassen
Ioanna Tsani
A. Testoni
6
1
0
24 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
66
106
0
23 Oct 2022
LEATHER: A Framework for Learning to Generate Human-like Text in
  Dialogue
LEATHER: A Framework for Learning to Generate Human-like Text in Dialogue
Anthony Sicilia
Malihe Alikhani
39
4
0
14 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
Understanding Embodied Reference with Touch-Line Transformer
Y. Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
158
15
0
11 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
27
0
0
23 Aug 2022
Modeling Non-Cooperative Dialogue: Theoretical and Empirical Insights
Modeling Non-Cooperative Dialogue: Theoretical and Empirical Insights
Anthony Sicilia
Tristan D. Maidment
Pat Healy
Malihe Alikhani
12
3
0
15 Jul 2022
12345
Next