v1v2 (latest)

GuessWhat?! Visual object discovery through multi-modal dialogue

23 November 2016

Olivier Pietquin

Aaron Courville

Papers citing "GuessWhat?! Visual object discovery through multi-modal dialogue"

50 / 237 papers shown

Title
The World According to LLMs: How Geographic Origin Influences LLMs' Entity Deduction Capabilities Harsh Nishant Lalai Raj Sanjay Shah Jiaxin Pei Sashank Varma Yi-Chia Wang Ali Emami LRM 97 0 0 07 Aug 2025
OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration Junwen Duan Wei Xue Ziyao Kang Shixia Liu Jiazhi Xia VLM 127 0 0 26 Jul 2025
MDC-R: The Minecraft Dialogue Corpus with Reference Chris Madge Maris Camilleri Paloma Carretero García Vanja Karan Juexi Shao Prashant Jayannavar Julian Hough Benjamin Roth Massimo Poesio 95 2 0 27 Jun 2025
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models Dota Tianai Dong Yifan Luo Po-Ya Angela Wang Asli Ozyurek Paula Rubio-Fernandez 154 0 0 29 May 2025
Referring to Any Person Qing Jiang Lin Wu Zhaoyang Zeng Tianhe Ren Yuda Xiong Yihao Chen Qin Liu Lei Zhang 855 12 0 11 Mar 2025
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and ChallengesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Bolei Ma Yuting Li Wei Zhou Ziwei Gong Wenshu Fan Katja Jasinskaja Annemarie Friedrich Julia Hirschberg Frauke Kreuter Barbara Plank ELM LRM 353 14 0 17 Feb 2025
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 Linhui Xiao Xiaoshan Yang X. Lan Yaowei Wang Changsheng Xu ObjD 863 27 0 28 Dec 2024
Multi-Modal Dialogue State Tracking for Playing GuessWhich GameCAAI International Conference on Artificial Intelligence (ICCAI), 2024 Wei Pang Ruixue Duan Jinfu Yang Ning Li 131 0 0 15 Aug 2024
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations Wei Pang Ruixue Duan Jinfu Yang Ning Li 104 0 0 13 Aug 2024
ActionVOS: Actions as Prompts for Video Object Segmentation Liangyang Ouyang Ruicong Liu Yifei Huang Ryosuke Furuta Yoichi Sato VOS 171 7 0 10 Jul 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models Jierun Chen Fangyun Wei Jinjing Zhao Sizhe Song Bohuai Wu Zhuoxuan Peng S.-H. Gary Chan Hongyang R. Zhang 229 31 0 24 Jun 2024
ChatShop: Interactive Information Seeking with Language Agents Sanxing Chen Sam Wiseman Bhuwan Dhingra KELM 301 17 0 15 Apr 2024
How Far Are We from Intelligent Visual Deductive Reasoning? Yizhe Zhang Richard He Bai Ruixiang Zhang Jiatao Gu Shuangfei Zhai J. Susskind Navdeep Jaitly ReLM LRM 347 26 0 07 Mar 2024
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments Savitha Sam Abraham Marjan Alirezaie Luc de Raedt 232 1 0 05 Mar 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation Yichi Zhang Ziqiao Ma Xiaofeng Gao Suhaila Shakiah Qiaozi Gao Joyce Chai MLLM VLM 327 74 0 26 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction Jie Xu Hanbo Zhang Xinghang Li Huaping Liu Xuguang Lan Tao Kong LM&Ro 263 5 0 19 Feb 2024
Improving Agent Interactions in Virtual Environments with Language Models Jack Zhang LLMAG 160 0 0 08 Feb 2024
Towards Unified Interactive Visual Grounding in The Wild Jie Xu Hanbo Zhang Qingyi Si Yifeng Li Xuguang Lan Tao Kong LM&Ro 254 5 0 30 Jan 2024
Which One? Leveraging Context Between Objects and Multiple Views for Language GroundingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 Chancharik Mitra Abrar Anwar Rodolfo Corona Dan Klein Trevor Darrell Jesse Thomason 137 2 0 12 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in ClutterConference on Robot Learning (CoRL), 2023 Georgios Tziafas Yucheng Xu Arushi Goel Mohammadreza Kasaei Zhibin Li Hamidreza Kasaei 195 39 0 09 Nov 2023
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching NetworkIndustrial Conference on Data Mining (IDM), 2023 Yiming Lin Xiao-Bo Jin Qiufeng Wang Kaizhu Huang 137 5 0 25 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions Hanbo Zhang Jie Xu Yuchen Mo Tao Kong 161 1 0 18 Oct 2023
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question GamesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Yizhe Zhang Jiarui Lu Navdeep Jaitly LRM ELM 278 21 0 02 Oct 2023
Resolving References in Visually-Grounded Dialogue via Text GenerationSIGDIAL Conferences (SIGDIAL), 2023 Bram Willemsen Livia Qian Gabriel Skantze 144 5 0 23 Sep 2023
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue Ryosuke Oshima Seitaro Shinagawa Hideki Tsunashima Qi Feng Shigeo Morishima 170 4 0 19 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object GraspingIEEE International Conference on Robotics and Automation (ICRA), 2023 Gi-Cheon Kang Junghyun Kim Suhyung Choi Byoung-Tak Zhang 386 7 0 14 Sep 2023
Collecting Visually-Grounded Dialogue with A Game Of SortsInternational Conference on Language Resources and Evaluation (LREC), 2023 Bram Willemsen Dmytro Kalpakchi Gabriel Skantze 81 2 0 10 Sep 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded ConversationsEuropean Conference on Computer Vision (ECCV), 2023 Kilichbek Haydarov Xiaoqian Shen Avinash Madasu Mahmoud Salem Jia Li Gamaleldin F. Elsayed Mohamed Elhoseiny 219 7 0 30 Aug 2023
Reinforcement Learning for Generative AI: A Survey Yuanjiang Cao Quan.Z Sheng Julian McAuley Lina Yao SyDa 427 22 0 28 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023 Yuhao Lu Yixuan Fan Beixing Deng Fan Liu Yali Li Shengjin Wang 238 55 0 01 Aug 2023
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesSIGDIAL Conferences (SIGDIAL), 2023 Javier Chiyah-Garcia Alessandro Suglia Arash Eshghi Helen F. Hastie 155 6 0 28 Jul 2023
Learning to Generate Equitable Text in Dialogue from Biased Training DataAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Anthony Sicilia Malihe Alikhani 267 20 0 10 Jul 2023
Solving Dialogue Grounding Embodied Task in a Simulated Environment using Further Masked Language Modeling Weijie Zhang 153 0 0 21 Jun 2023
Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference ChainAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Shih-Lun Wu Yi-Hui Chou Liang Li 134 0 0 16 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLPAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Sandro Pezzelle 144 11 0 08 Jun 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic TransitionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Yuxuan Wang Zilong Zheng Xueliang Zhao Jinpeng Li Yueqian Wang Dongyan Zhao VGen 169 14 0 30 May 2023
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System Mauajama Firdaus Avinash Madasu Asif Ekbal 272 9 0 27 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Haoqin Tu Yitong Li Fei Mi Zhongliang Yang 145 5 0 23 May 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural LanguageEuropean Conference on Computer Vision (ECCV), 2023 Zhe Lin Xidong Peng Peishan Cong Ge Zheng Yujin Sun Yuenan Hou Xinge Zhu Sibei Yang Yuexin Ma VGen 238 12 0 12 Apr 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023 Ziyang Lu Yunqiang Pei Guoqing Wang Yang Yang Zheng Wang Heng Tao Shen 155 12 0 23 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions Deyao Zhu Jun Chen Kilichbek Haydarov Xiaoqian Shen Wenxuan Zhang Mohamed Elhoseiny MLLM 208 122 0 12 Mar 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real WorldACM Multimedia (ACM MM), 2023 Hongpeng Lin Ludan Ruan Wenke Xia Peiyu Liu Jing Wen ... Di Hu Ruihua Song Wayne Xin Zhao Qin Jin Zhiwu Lu VGen 165 13 0 14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout GraphAAAI Conference on Artificial Intelligence (AAAI), 2023 Yuxing Long Binyuan Hui Fulong Ye Yanyang Li Zhuoxin Han Caixia Yuan Yongbin Li Xiaojie Wang LLMAG 175 8 0 05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges R. Zakari Jim Wilson Owusu Hailin Wang Ke Qin Zaharaddeen Karami Lawal Yue-hong Dong LRM 157 18 0 26 Dec 2022
A survey on knowledge-enhanced multimodal learningArtificial Intelligence Review (Artif Intell Rev), 2022 Maria Lymperaiou Giorgos Stamou 437 21 0 19 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog SystemConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Seungwhan Moon Satwik Kottur A. Geramifard Babak Damavandi 121 3 0 15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling ApproachesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Daniel Fried Nicholas Tomlin Jennifer Hu Roma Patel Aida Nematzadeh 183 9 0 15 Nov 2022
Towards Unifying Reference Expression Generation and ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Duo Zheng Tao Kong Ya Jing Jiaan Wang Xiaojie Wang ObjD 126 9 0 24 Oct 2022
Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue?International Conference on Natural Language Generation (INLG), 2022 Amit Kumar Chaudhary Alex J. Lucassen Ioanna Tsani A. Testoni 143 1 0 24 Oct 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing DataIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022 Yangfan Zhan Zhitong Xiong Yuan. Yuan 217 175 0 23 Oct 2022

All Papers

GuessWhat?! Visual object discovery through multi-modal dialogue

Papers citing "GuessWhat?! Visual object discovery through multi-modal dialogue"