ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1506.09215
  4. Cited By
Unsupervised Learning from Narrated Instruction Videos

Unsupervised Learning from Narrated Instruction Videos

30 June 2015
Jean-Baptiste Alayrac
Piotr Bojanowski
Nishant Agrawal
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
    SSL
ArXivPDFHTML

Papers citing "Unsupervised Learning from Narrated Instruction Videos"

50 / 52 papers shown
Title
Prompt2LVideos: Exploring Prompts for Understanding Long-Form Multimodal Videos
Soumya Jahagirdar
Jayasree Saha
C. V. Jawahar
56
0
0
11 Mar 2025
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Federico Spurio
Emad Bahrami
Gianpiero Francesca
Juergen Gall
39
0
0
23 Dec 2024
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action
  Segmentation
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation
Zijia Lu
Ehsan Elhamifar
40
2
0
28 Aug 2023
Leveraging triplet loss for unsupervised action segmentation
Leveraging triplet loss for unsupervised action segmentation
Elena Belén Bueno-Benito
Biel Tura Vecino
Mariella Dimiccoli
22
7
0
13 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
25
38
0
31 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
67
30
0
26 Mar 2023
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
  Clustering
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering
Wei Lin
Anna Kukleva
Horst Possegger
Hilde Kuehne
Horst Bischof
48
2
0
09 Mar 2023
STEPs: Self-Supervised Key Step Extraction and Localization from
  Unlabeled Procedural Videos
STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos
Anshul B. Shah
Benjamin Lundell
H. Sawhney
Ramalingam Chellappa
SSL
16
8
0
02 Jan 2023
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene
  Segmentation
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
16
2
0
09 Dec 2022
Human in the loop approaches in multi-modal conversational task guidance
  system development
Human in the loop approaches in multi-modal conversational task guidance system development
R. Manuvinakurike
Sovan Biswas
G. Raffa
R. Beckwith
A. Rhodes
Meng Shi
Gesem Gudino Mejia
Saurav Sahay
L. Nachman
29
2
0
03 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
16
4
0
29 Oct 2022
Timestamp-Supervised Action Segmentation with Graph Convolutional
  Networks
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan
S. Haresh
Awais Ahmed
Shakeeb Siddiqui
Andrey Konin
Mohammad Zeeshan
Quoc-Huy Tran
19
22
0
30 Jun 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with
  Weak Supervision
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
26
45
0
04 May 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
19
204
0
28 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
21
32
0
22 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
A. Glavan
Estefanía Talavera
15
10
0
23 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
21
17
0
13 Dec 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
40
60
0
30 Sep 2021
Unsupervised Action Segmentation by Joint Representation Learning and
  Online Clustering
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
22
46
0
27 May 2021
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex
  Activities
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities
S. Swetha
Hilde Kuehne
Y. S. Rawat
M. Shah
27
16
0
30 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
23
127
0
30 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
  Transfer of Vision-Language Models
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
21
56
0
16 Mar 2021
Disambiguation of weak supervision with exponential convergence rates
Disambiguation of weak supervision with exponential convergence rates
Vivien A. Cabannes
Francis R. Bach
Alessandro Rudi
16
5
0
04 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
34
417
0
14 Nov 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
29
48
0
29 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
33
100
0
08 May 2020
Learning Interactions and Relationships between Movie Characters
Learning Interactions and Relationships between Movie Characters
Anna Kukleva
Makarand Tapaswi
Ivan Laptev
36
51
0
29 Mar 2020
Discriminative Clustering with Representation Learning with any Ratio of
  Labeled to Unlabeled Data
Discriminative Clustering with Representation Learning with any Ratio of Labeled to Unlabeled Data
Corinne Jones
Vincent Roulet
Zaïd Harchaoui
28
1
0
30 Dec 2019
Action Modifiers: Learning from Adverbs in Instructional Videos
Action Modifiers: Learning from Adverbs in Instructional Videos
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
12
30
0
13 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
31
9
0
31 Oct 2019
Learning Video Representations using Contrastive Bidirectional
  Transformer
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
13
133
0
13 Jun 2019
COIN: A Large-scale Dataset for Comprehensive Instructional Video
  Analysis
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
16
302
0
07 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding
  for Video Captioning
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Nayyer Aafaq
Naveed Akhtar
W. Liu
Syed Zulqarnain Gilani
Ajmal Saeed Mian
18
204
0
27 Feb 2019
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly
  Supervised Action Alignment and Segmentation
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
C. Chang
De-An Huang
Yanan Sui
Li Fei-Fei
Juan Carlos Niebles
22
156
0
09 Jan 2019
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination
  in Long Videos
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos
Hazel Doughty
W. Mayol-Cuevas
Dima Damen
25
138
0
13 Dec 2018
Zero-Shot Anticipation for Instructional Activities
Zero-Shot Anticipation for Instructional Activities
Fadime Sener
Angela Yao
LM&Ro
23
68
0
06 Dec 2018
Combining Deep Learning and Qualitative Spatial Reasoning to Learn
  Complex Structures from Sparse Examples with Noise
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise
Nikhil Krishnaswamy
Scott E. Friedman
James Pustejovsky
NAI
13
32
0
27 Nov 2018
VirtualHome: Simulating Household Activities via Programs
VirtualHome: Simulating Household Activities via Programs
Xavier Puig
K. Ra
Marko Boben
Jiaman Li
Tingwu Wang
Sanja Fidler
Antonio Torralba
LM&Ro
17
475
0
19 Jun 2018
Weakly-Supervised Action Segmentation with Iterative Soft Boundary
  Assignment
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
Li Ding
Chenliang Xu
22
180
0
28 Mar 2018
Unsupervised Learning and Segmentation of Complex Activities from Video
Unsupervised Learning and Segmentation of Complex Activities from Video
Fadime Sener
Angela Yao
16
112
0
26 Mar 2018
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Paul Vicol
Makarand Tapaswi
Lluis Castrejon
Sanja Fidler
25
136
0
19 Dec 2017
Learning from Video and Text via Large-Scale Discriminative Clustering
Learning from Video and Text via Large-Scale Discriminative Clustering
Antoine Miech
Jean-Baptiste Alayrac
Piotr Bojanowski
Ivan Laptev
Josef Sivic
21
44
0
27 Jul 2017
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling
Alexander Richard
Hilde Kuehne
Juergen Gall
14
195
0
23 Mar 2017
Joint Discovery of Object States and Manipulation Actions
Joint Discovery of Object States and Manipulation Actions
Jean-Baptiste Alayrac
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
20
79
0
09 Feb 2017
Robust Discriminative Clustering with Sparse Regularizers
Robust Discriminative Clustering with Sparse Regularizers
Nicolas Flammarion
B. Palaniappan
Francis R. Bach
13
18
0
29 Aug 2016
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
De-An Huang
Li Fei-Fei
Juan Carlos Niebles
14
237
0
28 Jul 2016
SEMBED: Semantic Embedding of Egocentric Action Videos
SEMBED: Semantic Embedding of Egocentric Action Videos
Michael Wray
Davide Moltisanti
W. Mayol-Cuevas
Dima Damen
21
14
0
28 Jul 2016
12
Next