ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00706
17
21

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

2 May 2020
Frank F. Xu
Lei Ji
Botian Shi
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
ArXivPDFHTML
Abstract

Watching instructional videos are often used to learn about procedures. Video captioning is one way of automatically collecting such knowledge. However, it provides only an indirect, overall evaluation of multimodal models with no finer-grained quantitative measure of what they have learned. We propose instead, a benchmark of structured procedural knowledge extracted from cooking videos. This work is complementary to existing tasks, but requires models to produce interpretable structured knowledge in the form of verb-argument tuples. Our manually annotated open-vocabulary resource includes 356 instructional cooking videos and 15,523 video clip/sentence-level annotations. Our analysis shows that the proposed task is challenging and standard modeling approaches like unsupervised segmentation, semantic role labeling, and visual action detection perform poorly when forced to predict every action of a procedure in a structured form.

View on arXiv
Comments on this paper