ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.02874
  4. Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video
  Analysis

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
ArXiv (abs)PDFHTML

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

50 / 266 papers shown
Title
Towards Object-centric Understanding for Instructional Videos
Towards Object-centric Understanding for Instructional Videos
Wenliang Guo
Yu Kong
92
0
0
03 Dec 2025
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Chenting Wang
Yuhan Zhu
Yicheng Xu
Jiange Yang
Ziang Yan
Yali Wang
Yi Wang
Limin Wang
VGen
121
0
0
01 Dec 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
J. Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLMVLM
173
0
0
01 Dec 2025
Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal Grounding
Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal Grounding
Jin-Seop Lee
SungJoon Lee
SeongJun Jung
Boyang Li
Jee-Hyong Lee
OOD
136
0
0
28 Nov 2025
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
Apratim Bhattacharyya
Bicheng Xu
Sanjay Haresh
Reza Pourreza
Litian Liu
Sunny Panchal
Pulkit Madan
Leonid Sigal
Roland Memisevic
96
0
0
27 Nov 2025
A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking
A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking
Chengan Che
Chao Wang
Xinyue Chen
Sophia Tsoka
Luis C. Garcia-Peraza-Herrera
AI4TS
178
0
0
21 Nov 2025
SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
Jieru Lin
Zhiwei Yu
Börje F. Karlsson
76
0
0
20 Nov 2025
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
Boshen Xu
Zihan Xiao
Jiaze Li
Jianzhong Ju
Zhenbo Luo
Jian Luan
Qin Jin
Mamba
495
0
0
20 Nov 2025
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
Junhao Cheng
Liang Hou
Xin Tao
Jing Liao
VGen
321
0
0
20 Nov 2025
Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities
Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities
Fan Yang
Quanting Xie
Atsunori Moteki
S. Masui
Shan Jiang
Kanji Uchino
Yonatan Bisk
Graham Neubig
AI4TS
173
0
0
18 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
167
0
0
17 Nov 2025
Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges
Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges
Junlong Li
Huaiyuan Xu
Sijie Cheng
Kejun Wu
Kim-Hui Yap
Lap-Pui Chau
Yi Wang
EgoV
221
0
0
17 Nov 2025
Web-Scale Collection of Video Data for 4D Animal Reconstruction
Web-Scale Collection of Video Data for 4D Animal Reconstruction
Brian Nlong Zhao
Jiajun Wu
Shangzhe Wu
112
1
0
03 Nov 2025
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
Mahiro Ukai
Shuhei Kurita
Nakamasa Inoue
CoGe
213
0
0
26 Oct 2025
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Jiahao Meng
X. Li
Haochen Wang
Yue Tan
Tao Zhang
...
Yunhai Tong
Anran Wang
Zhiyang Teng
Y. Wang
Z. Wang
VGenLRM
308
6
0
23 Oct 2025
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
Vision-Based Mistake Analysis in Procedural Activities: A Review of Advances and Challenges
Konstantinos Bacharidis
Antonis A. Argyros
132
0
0
22 Oct 2025
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Zhuo Cao
Heming Du
Bingqing Zhang
Xin Yu
Xue Li
Sen Wang
116
0
0
20 Oct 2025
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction
Qile Su
Shoutai Zhu
Shuai Zhang
Baoyu Liang
Chao Tong
AI4TS
88
1
0
19 Oct 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
164
0
0
19 Oct 2025
Training-free Online Video Step Grounding
Training-free Online Video Step Grounding
Luca Zanella
Massimiliano Mancini
Yiming Wang
Alessio Tonioni
Elisa Ricci
104
0
0
19 Oct 2025
StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales
StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales
Nyle Siddiqui
Rohit Gupta
S. Swetha
Mubarak Shah
140
0
0
17 Oct 2025
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
Yulin Zhang
Cheng Shi
Yang Wang
Sibei Yang
EgoV
196
0
0
16 Oct 2025
Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos
Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational VideosComputer Vision and Pattern Recognition (CVPR), 2023
Rohit Gupta
Anirban Roy
Claire Christensen
Sujeong Kim
Sarah Gerard
Madeline Cincebeaux
Ajay Divakaran
Todd Grindal
M. Shah
132
21
0
13 Oct 2025
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
Xianhang Li
Chen Huang
Chun-Liang Li
Eran Malach
J. Susskind
Vimal Thilak
Etai Littwin
134
1
0
29 Sep 2025
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
...
Kun Tian
Meng Tian
Xinhai Zhao
Yi Wang
Limin Wang
183
2
0
29 Sep 2025
Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
Mohamad Ballout
Okajevo Wilfred
Seyedalireza Yaghoubi
Nohayr Muhammad Abdelmoneim
Julius Mayer
Elia Bruni
90
0
0
29 Sep 2025
AHA - Predicting What Matters Next: Online Highlight Detection Without Looking Ahead
AHA - Predicting What Matters Next: Online Highlight Detection Without Looking Ahead
Aiden Chang
Celso De Melo
Stephanie M. Lukin
133
0
0
19 Sep 2025
ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
Kimihiro Hasegawa
Wiradee Imrattanatrai
Masaki Asada
Susan Holm
Yuran Wang
Vincent Zhou
Ken Fukuda
Teruko Mitamura
136
2
0
03 Sep 2025
Planning with Reasoning using Vision Language World Model
Planning with Reasoning using Vision Language World Model
Delong Chen
Theo Moutakanni
Willy Chung
Yejin Bang
Ziwei Ji
Allen Bolourchi
Pascale Fung
VGenVLM
225
9
0
02 Sep 2025
Why Relational Graphs Will Save the Next Generation of Vision Foundation Models?
Why Relational Graphs Will Save the Next Generation of Vision Foundation Models?Social Science Research Network (SSRN), 2025
Fatemeh Ziaeetabar
84
0
0
25 Aug 2025
Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing
Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing
Yogesh Kumar
VLM
74
0
0
25 Aug 2025
Generating Dialogues from Egocentric Instructional Videos for Task Assistance: Dataset, Method and Benchmark
Generating Dialogues from Egocentric Instructional Videos for Task Assistance: Dataset, Method and Benchmark
Lavisha Aggarwal
Vikas Bahirwani
Lin Li
Andrea Colaco
VGen
85
0
0
15 Aug 2025
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang
Yingchen Yu
Yunqing Zhao
Shijian Lu
Song Bai
114
2
0
03 Aug 2025
ReasonAct: Progressive Training for Fine-Grained Video Reasoning in Small Models
ReasonAct: Progressive Training for Fine-Grained Video Reasoning in Small Models
Jiaxin Liu
Zhaolu Kang
OffRLLRM
269
1
0
03 Aug 2025
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction
Ce Zhang
Yale Song
Ruta Desai
Michael L. Iuzzolino
Joseph Tighe
Gedas Bertasius
Satwik Kottur
159
1
0
20 Jul 2025
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition
Xu Shaowu
Jia Xibin
Gao Junyu
Sun Qianmei
Chang Jing
Fan Chao
181
0
0
09 Jul 2025
LEGO Co-builder: Exploring Fine-Grained Vision-Language Modeling for Multimodal LEGO Assembly Assistants
LEGO Co-builder: Exploring Fine-Grained Vision-Language Modeling for Multimodal LEGO Assembly Assistants
Haochen Huang
Jiahuan Pei
Mohammad Aliannejadi
Xin Sun
Moonisa Ahsan
Chuang Yu
Zhaochun Ren
Pablo César
Junxiao Wang
VLM
196
1
0
07 Jul 2025
InstructionBench: An Instructional Video Understanding Benchmark
InstructionBench: An Instructional Video Understanding Benchmark
Haiwan Wei
Yitian Yuan
Xiaohan Lan
Wei Ke
Lin Ma
ELM
287
3
0
01 Jul 2025
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li
Shangzhe Di
Zhonghua Zhai
Weilin Huang
Yanfeng Wang
Weidi Xie
VLM
138
6
0
23 Jun 2025
Anomaly Detection and Generation with Diffusion Models: A Survey
Anomaly Detection and Generation with Diffusion Models: A Survey
Zehua Wang
Jing Liu
Chengfang Li
Rui Xi
W. Li
Liang Cao
Jin Wang
L. Yang
Junsong Yuan
Wei Zhou
DiffMMedIm
222
5
0
11 Jun 2025
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Benno Krojer
Mojtaba Komeili
Candace Ross
Q. Garrido
Koustuv Sinha
Nicolas Ballas
Mahmoud Assran
273
4
0
11 Jun 2025
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran
Adrien Bardes
David Fan
Q. Garrido
Russell Howes
...
Sarath Chandar
Franziska Meier
Yann LeCun
Michael G. Rabbat
Nicolas Ballas
261
125
0
11 Jun 2025
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
Minghao Zou
Qingtian Zeng
Yongping Miao
Shangkun Liu
Zilong Wang
Hantao Liu
Wei Zhou
193
1
0
07 Jun 2025
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning
Delong Chen
Willy Chung
Yejin Bang
Ziwei Ji
Pascale Fung
VGenLM&Ro
232
6
0
04 Jun 2025
From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control
From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control
Jusheng Zhang
Jinzhou Tang
Sidi Liu
Mingyan Li
Sheng Zhang
Jian Wang
Keze Wang
200
0
0
28 May 2025
Predicting Implicit Arguments in Procedural Video Instructions
Predicting Implicit Arguments in Procedural Video InstructionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Anil Batra
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
170
0
0
27 May 2025
$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion
I2GI^2GI2G: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi
Pinxin Liu
Ali Vosoughi
Jiarui Wu
Jinxi He
Chenliang Xu
DiffM
222
0
0
22 May 2025
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
Fatemeh Ziaeetabar
Florentin Wörgötter
338
3
0
21 May 2025
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal InconsistencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jiafeng Liang
Shixin Jiang
Xuan Dong
Ning Wang
Zheng Chu
Hui Su
Jinlan Fu
Ming-Yuan Liu
See-Kiong Ng
Bing Qin
227
1
0
20 May 2025
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Thong Nguyen
Zhiyuan Hu
Xu Lin
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
348
1
0
19 May 2025
123456
Next