Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1503.01558
Cited By
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision
5 March 2015
J. Malmaud
Jonathan Huang
V. Rathod
Nick Johnston
Andrew Rabinovich
Kevin Patrick Murphy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision"
34 / 34 papers shown
Title
Every Mistake Counts in Assembly
Guodong Ding
Fadime Sener
Shugao Ma
Angela Yao
32
12
0
31 Jul 2023
STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos
Anshul B. Shah
Benjamin Lundell
H. Sawhney
Ramalingam Chellappa
SSL
18
8
0
02 Jan 2023
Rethinking Cooking State Recognition with Vision Transformers
A. Khan
Alif Ashrafee
Reeshoon Sayera
Shahriar Ivan
Sabbir Ahmed
ViT
27
7
0
16 Dec 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
47
75
0
19 Oct 2022
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan
S. Haresh
Awais Ahmed
Shakeeb Siddiqui
Andrey Konin
Mohammad Zeeshan
Quoc-Huy Tran
27
22
0
30 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
528
0
13 Jun 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
33
205
0
28 Mar 2022
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
30
372
0
04 Jun 2021
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
27
47
0
27 May 2021
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities
S. Swetha
Hilde Kuehne
Yogesh S Rawat
M. Shah
27
16
0
30 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
24
56
0
16 Mar 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
38
14
0
07 Jan 2021
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
Towards Robust Pattern Recognition: A Review
Xu-Yao Zhang
Cheng-Lin Liu
C. Suen
OOD
HAI
19
103
0
12 Jun 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela S. Lin
Sudha Rao
Asli Celikyilmaz
E. Nouri
Chris Brockett
Debadeepta Dey
Bill Dolan
26
24
0
19 May 2020
Learning to Segment Actions from Observation and Narration
Daniel Fried
Jean-Baptiste Alayrac
Phil Blunsom
Chris Dyer
S. Clark
Aida Nematzadeh
33
31
0
07 May 2020
Action Modifiers: Learning from Adverbs in Instructional Videos
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
27
30
0
13 Dec 2019
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation
Hilde Kuehne
Alexander Richard
Juergen Gall
27
82
0
03 Jun 2019
Unsupervised learning of action classes with continuous temporal embedding
Anna Kukleva
Hilde Kuehne
Fadime Sener
Juergen Gall
27
107
0
08 Apr 2019
Cross-task weakly supervised learning from instructional videos
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
25
243
0
19 Mar 2019
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
C. Chang
De-An Huang
Yanan Sui
Li Fei-Fei
Juan Carlos Niebles
22
156
0
09 Jan 2019
Zero-Shot Anticipation for Instructional Activities
Fadime Sener
Angela Yao
LM&Ro
25
68
0
06 Dec 2018
A Perceptual Prediction Framework for Self Supervised Event Segmentation
Sathyanarayanan N. Aakur
Sudeep Sarkar
16
69
0
12 Nov 2018
Classifying cooking object's state using a tuned VGG convolutional neural network
Rahul Paul
27
13
0
23 May 2018
Unsupervised Learning and Segmentation of Complex Activities from Video
Fadime Sener
Angela Yao
19
112
0
26 Mar 2018
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
Pelin Dogan
Boyang Albert Li
Leonid Sigal
Markus Gross
AI4TS
30
19
0
19 Feb 2018
Food recognition and recipe analysis: integrating visual content, context and external knowledge
Luis Herranz
Weiqing Min
Shuqiang Jiang
20
29
0
22 Jan 2018
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
55
927
0
04 Aug 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,865
0
26 May 2017
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling
Alexander Richard
Hilde Kuehne
Juergen Gall
34
195
0
23 Mar 2017
Joint Discovery of Object States and Manipulation Actions
Jean-Baptiste Alayrac
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
22
79
0
09 Feb 2017
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
De-An Huang
Li Fei-Fei
Juan Carlos Niebles
24
237
0
28 Jul 2016
Weakly-Supervised Alignment of Video With Text
Piotr Bojanowski
Rémi Lajugie
Edouard Grave
Francis R. Bach
Ivan Laptev
Jean Ponce
Cordelia Schmid
41
134
0
22 May 2015
1