ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.00738
  4. Cited By
Can Foundation Models Watch, Talk and Guide You Step by Step to Make a
  Cake?

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

1 November 2023
Yuwei Bao
Keunwoo Peter Yu
Yichi Zhang
Shane Storks
Itamar Bar-Yossef
Alexander De La Iglesia
Megan Su
Xiao Lin Zheng
Joyce Chai
ArXivPDFHTML

Papers citing "Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?"

9 / 9 papers shown
Title
"Is This It?": Towards Ecologically Valid Benchmarks for Situated
  Collaboration
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus
Sean Andrist
Yuwei Bao
Eric Horvitz
Ann Paradiso
27
0
0
30 Aug 2024
AI-Powered Immersive Assistance for Interactive Task Execution in
  Industrial Environments
AI-Powered Immersive Assistance for Interactive Task Execution in Industrial Environments
Tomislav Duricic
Peter Müllner
Nicole Weidinger
Neven Elsayed
Dominik Kowald
Eduardo E. Veas
25
1
0
12 Jul 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning
  of Large Language Models
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Jianben He
Xingbo Wang
Shiyi Liu
Guande Wu
Claudio Silva
Huamin Qu
LRM
29
1
0
06 Jun 2024
SIGMA: An Open-Source Interactive System for Mixed-Reality Task
  Assistance Research
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
D. Bohus
Sean Andrist
Nick Saw
Ann Paradiso
Ishani Chakraborty
Mahdi Rad
38
9
0
16 May 2024
Vision-Language Models as Success Detectors
Vision-Language Models as Success Detectors
Yuqing Du
Ksenia Konyushkova
Misha Denil
A. Raju
Jessica Landon
Felix Hill
Nando de Freitas
Serkan Cabi
MLLM
LRM
84
77
0
13 Mar 2023
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
Ivan Kapelyukh
Vitalis Vosylius
Edward Johns
LM&Ro
DiffM
105
144
0
05 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
TEACh: Task-driven Embodied Agents that Chat
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
152
180
0
01 Oct 2021
MindCraft: Theory of Mind Modeling for Situated Dialogue in
  Collaborative Tasks
MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks
Cristian-Paul Bara
Sky CH-Wang
J. Chai
65
61
0
13 Sep 2021
1