Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
Cake?

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

1 November 2023

Keunwoo Peter Yu

Itamar Bar-Yossef

Alexander De La Iglesia

Papers citing "Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?"

9 / 9 papers shown

Title
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration D. Bohus Sean Andrist Yuwei Bao Eric Horvitz Ann Paradiso 27 0 0 30 Aug 2024
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments Tomislav Duricic Peter Müllner Nicole Weidinger Neven Elsayed Dominik Kowald Eduardo E. Veas 25 1 0 12 Jul 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models Jianben He Xingbo Wang Shiyi Liu Guande Wu Claudio Silva Huamin Qu LRM 29 1 0 06 Jun 2024
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research D. Bohus Sean Andrist Nick Saw Ann Paradiso Ishani Chakraborty Mahdi Rad 38 9 0 16 May 2024
Vision-Language Models as Success Detectors Yuqing Du Ksenia Konyushkova Misha Denil A. Raju Jessica Landon Felix Hill Nando de Freitas Serkan Cabi MLLM LRM 84 77 0 13 Mar 2023
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics Ivan Kapelyukh Vitalis Vosylius Edward Johns LM&Ro DiffM 105 144 0 05 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 388 4,110 0 28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat Aishwarya Padmakumar Jesse Thomason Ayush Shrivastava P. Lange Anjali Narayan-Chen Spandana Gella Robinson Piramithu Gökhan Tür Dilek Z. Hakkani-Tür LM&Ro 152 180 0 01 Oct 2021
MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks Cristian-Paul Bara Sky CH-Wang J. Chai 65 61 0 13 Sep 2021