Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1903.02874
Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"
50 / 267 papers shown
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Ye Liu
Zongyang Ma
Chen Ma
Yang Wu
Ying Shan
Chang Wen Chen
273
52
0
26 Sep 2024
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
European Conference on Computer Vision (ECCV), 2024
Yuxiao Chen
Keqin Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
297
5
0
22 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
285
13
0
10 Sep 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
S. Lai
Winston H. Hsu
429
0
0
30 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Neural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
249
32
0
29 Aug 2024
Diffusion Model for Planning: A Systematic Literature Review
Toshihide Ubukata
Jialong Li
Kenji Tei
DiffM
MedIm
288
17
0
16 Aug 2024
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
European Conference on Computer Vision (ECCV), 2024
Shishira R. Maiya
Anubhav Gupta
M. Gwilliam
Max Ehrlich
Abhinav Shrivastava
200
7
1
05 Aug 2024
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
European Conference on Computer Vision (ECCV), 2024
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
279
3
0
05 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
457
11
0
01 Aug 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
294
4
0
16 Jul 2024
Open-Event Procedure Planning in Instructional Videos
Yilu Wu
Hanlin Wang
Jing Wang
Limin Wang
275
1
0
06 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
306
115
0
30 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
184
3
0
26 Jun 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Yuxuan Wang
Yueqian Wang
Dongyan Zhao
Cihang Xie
Zilong Zheng
MLLM
VLM
268
53
0
24 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
472
17
0
20 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
314
109
0
17 Jun 2024
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
247
6
0
14 Jun 2024
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuan-Ming Li
Wei-Jin Huang
An-Lan Wang
Ling-an Zeng
Jing-Ke Meng
Wei-Shi Zheng
331
22
0
13 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
596
28
1
09 Jun 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
432
10
0
24 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
368
76
0
24 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
362
184
0
08 Apr 2024
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Shiyi Zhang
Wen-Dao Dai
Sujia Wang
Xiangwei Shen
Jiwen Lu
Jie Zhou
Yansong Tang
226
52
0
07 Apr 2024
PREGO: online mistake detection in PRocedural EGOcentric videos
Computer Vision and Pattern Recognition (CVPR), 2024
Alessandro Flaborea
Guido Maria DÁmely di Melendugno
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Antonino Furnari
G. Farinella
Yuta Kyuragi
EgoV
297
30
0
02 Apr 2024
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
241
104
0
27 Mar 2024
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Ali Zare
Yulei Niu
Hammad A. Ayyubi
Shih-Fu Chang
222
4
0
27 Mar 2024
ActionDiffusion: An Action-aware Diffusion Model for Procedure Planning in Instructional Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Lei Shi
Paul-Christian Bürkner
Andreas Bulling
DiffM
VGen
238
6
0
13 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
European Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
286
398
0
11 Mar 2024
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Yasas Nagasinghe
Honglu Zhou
Malitha Gunawardhana
Martin Renqiang Min
Daniel Harari
Muhammad Haris Khan
256
12
0
05 Mar 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Yulei Niu
Wenliang Guo
Long Chen
Xudong Lin
Shih-Fu Chang
285
21
0
03 Mar 2024
CI w/o TN: Context Injection without Task Name for Procedure Planning
Xinjie Li
209
0
0
23 Feb 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
670
85
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
391
68
0
20 Feb 2024
FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation
Takuma Yagi
Misaki Ohashi
Yifei Huang
Ryosuke Furuta
Shungo Adachi
Toutai Mitsuyama
Yoichi Sato
235
11
0
01 Feb 2024
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin
Jie Zhang
Zhenyu Huang
Jia-Wei Liu
Zujie Wen
Xi Peng
355
36
0
30 Jan 2024
Zero Shot Open-ended Video Inference
Ee Yeo Keat
Zhang Hao
Alexander Matyasko
Basura Fernando
VLM
146
0
0
23 Jan 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
305
13
0
22 Jan 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
424
2
0
19 Jan 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
491
7
0
03 Jan 2024
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
269
29
0
22 Dec 2023
Implicit Affordance Acquisition via Causal Action-Effect Modeling in the Video Domain
Hsiu-yu Yang
Carina Silberer
162
1
0
18 Dec 2023
Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis
Tianyao He
Huabin Liu
Yuxi Li
Xiao Ma
Cheng Zhong
Yang Zhang
Weiyao Lin
313
7
0
18 Dec 2023
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Tomávs Souvcek
Dima Damen
Michael Wray
Ivan Laptev
Josef Sivic
VGen
257
41
0
12 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAG
ELM
LRM
371
33
0
11 Dec 2023
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
286
7
0
07 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
372
356
0
04 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
507
2
0
30 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
331
2
0
28 Nov 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
299
11
0
28 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
397
0
0
27 Nov 2023
Previous
1
2
3
4
5
6
Next
Page 3 of 6