ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.08225
  4. Cited By
Cross-task weakly supervised learning from instructional videos

Cross-task weakly supervised learning from instructional videos

19 March 2019
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
    SSL
ArXivPDFHTML

Papers citing "Cross-task weakly supervised learning from instructional videos"

50 / 57 papers shown
Title
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions
Chang Zong
Bin Li
Shoujun Zhou
Jian Wan
Lei Zhang
123
0
0
22 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
46
0
0
10 Apr 2025
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
72
0
0
25 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
112
0
0
12 Feb 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
41
1
0
13 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Luigi Seminara
G. Farinella
Antonino Furnari
61
7
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Egocentric and Exocentric Methods: A Short Survey
Egocentric and Exocentric Methods: A Short Survey
Anirudh Thatipelli
Shao-Yuan Lo
Amit K. Roy-Chowdhury
EgoV
42
2
0
27 Oct 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
44
2
0
01 Aug 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action
  Generalization
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
39
1
0
28 Mar 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
29
44
0
20 Feb 2024
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
24
5
0
21 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
53
1
0
30 Nov 2023
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action
  Segmentation
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation
Zijia Lu
Ehsan Elhamifar
40
2
0
28 Aug 2023
Every Mistake Counts in Assembly
Every Mistake Counts in Assembly
Guodong Ding
Fadime Sener
Shugao Ma
Angela Yao
32
12
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
46
49
0
31 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
24
20
0
27 Jul 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
31
30
0
08 Jun 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou
Sha Li
Manling Li
Xudong Lin
Shih-Fu Chang
Mohit Bansal
Heng Ji
25
8
0
27 May 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
25
38
0
31 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
27
7
0
29 Mar 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Mohit Bansal
3DV
18
51
0
29 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
63
30
0
26 Mar 2023
LaMPP: Language Models as Probabilistic Priors for Perception and Action
LaMPP: Language Models as Probabilistic Priors for Perception and Action
Belinda Z. Li
William Chen
Pratyusha Sharma
Jacob Andreas
24
15
0
03 Feb 2023
Action Dynamics Task Graphs for Learning Plannable Representations of
  Procedural Tasks
Action Dynamics Task Graphs for Learning Plannable Representations of Procedural Tasks
Weichao Mao
Ruta Desai
Michael L. Iuzzolino
Nitin Kamra
24
5
0
11 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
20
51
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
18
4
0
05 Jan 2023
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
Robust Action Segmentation from Timestamp Supervision
Robust Action Segmentation from Timestamp Supervision
Yaser Souri
Yazan Abu Farha
Emad Bahrami
Gianpiero Francesca
Juergen Gall
19
6
0
12 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional
  Videos
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
24
5
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Mohit Bansal
VLM
49
28
0
28 Sep 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
20
17
0
01 Aug 2022
Disentangled Action Recognition with Knowledge Bases
Disentangled Action Recognition with Knowledge Bases
Zhekun Luo
Shalini Ghosh
Devin Guillory
Keizo Kato
Trevor Darrell
Huijuan Xu
21
7
0
04 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
26
45
0
04 May 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
19
204
0
28 Mar 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Fei Wu
Yi Yang
Yueting Zhuang
X. Wang
31
73
0
24 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
22
19
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
19
32
0
22 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
22
82
0
26 Jan 2022
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun
Minchul Shin
Gunsoo Han
Sangho Lee
S. Ha
Joonseok Lee
Eun-Sol Kim
SSL
44
20
0
14 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
18
108
0
13 Jan 2022
Low-Rank Constraints for Fast Inference in Structured Models
Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu
Yuntian Deng
Alexander M. Rush
BDL
27
13
0
08 Jan 2022
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
21
17
0
13 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
ViT
25
129
0
08 Dec 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
19
82
0
13 Oct 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
66
44
0
21 Sep 2021
12
Next