ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.02874
  4. Cited By
COIN: A Large-scale Dataset for Comprehensive Instructional Video
  Analysis

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

7 March 2019
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
ArXiv (abs)PDFHTML

Papers citing "COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis"

50 / 267 papers shown
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Scalable Video-to-Dataset Generation for Cross-Platform Mobile AgentsComputer Vision and Pattern Recognition (CVPR), 2025
Yunseok Jang
Yeda Song
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Dong-Ki Kim
Kyunghoon Bae
Honglak Lee
VGen
241
4
0
19 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping Huang
OffRL
612
5
0
08 May 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
You Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Dianbo Sui
Qi Liu
Yanzhe Zhang
Xu Sun
272
14
0
24 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
254
3
0
10 Apr 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi
Yiming Zhao
Y. Zeng
Xikun Bao
Wenjie Huang
Lin Yen-Chen
Zehui Chen
Jie Zhao
Zhongang Qi
Feng Zhao
LRM
307
18
0
10 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
288
2
0
03 Apr 2025
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video ContextsComputer Vision and Pattern Recognition (CVPR), 2025
Yanjie Wang
Longji Xu
Bo Chen
Tong Wu
Dongyan Zhao
Zilong Zheng
VLMMLLM
331
8
0
29 Mar 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
676
2
0
27 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLMVLMLRM
435
12
0
25 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
395
3
0
19 Mar 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
282
1
0
18 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
280
5
0
17 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
348
2
0
16 Mar 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
VLog: Video-Language Models by Generative Retrieval of Narration VocabularyComputer Vision and Pattern Recognition (CVPR), 2025
Kevin Qinghong Lin
Mike Zheng Shou
VGen
1.0K
3
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Shehreen Azad
Vibhav Vineet
Yogesh S Rawat
VLM
1.1K
11
0
11 Mar 2025
CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning
Lei Shi
Andreas Bulling
DiffM
290
3
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yue Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
926
9
0
08 Mar 2025
Data Augmentation for Instruction Following Policies via Trajectory SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2025
Niklas Höpner
Ilaria Tiddi
H. V. Hoof
229
0
0
25 Feb 2025
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training
Karan Samel
Nitish Sontakke
Irfan Essa
243
1
0
24 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
695
29
0
12 Feb 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Multimodal Fusion and Coherence Modeling for Video Topic SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
430
0
0
31 Dec 2024
SUTrack: Towards Simple and Unified Single Object Tracking
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen
Ben Kang
Wanting Geng
Jiawen Zhu
Zichen Liu
Dong Wang
Huchuan Lu
VOTViT
282
4
0
26 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?The Web Conference (WWW), 2024
Xi Ding
Lei Wang
912
10
0
18 Dec 2024
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video
  Prompting
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video PromptingAAAI Conference on Artificial Intelligence (AAAI), 2024
Muhammet Furkan Ilaslan
Ali Koksal
Kevin Qinghong Lin
Burak Satar
Mike Zheng Shou
Qianli Xu
LM&Ro
280
2
0
16 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
  Multi-grained Video-language Learning
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yanjie Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
287
1
0
10 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
651
6
0
04 Dec 2024
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame CaptioningComputer Vision and Pattern Recognition (CVPR), 2024
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
608
6
0
03 Dec 2024
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction FormatConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yueqian Wang
Xiaojun Meng
Yijiao Wang
Jianxin Liang
Jiansheng Wei
Huishuai Zhang
Dongyan Zhao
VGen
268
19
0
27 Nov 2024
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal GroundingComputer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Zhongpai Gao
Anwesa Choudhuri
Benjamin Planche
Meng Zheng
Sijin Yu
Terrence Chen
Chong Chen
Ziyan Wu
AI4TS
344
6
0
25 Nov 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural
  Videos
ACE: Action Concept Enhancement of Video-Language Models in Procedural VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
284
0
0
23 Nov 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet VideosNeural Information Processing Systems (NeurIPS), 2024
Yunong Liu
Cristobal Eyzaguirre
Pengfei Yu
Shubh Khanna
Juan Carlos Niebles
Vineeth Ravi
Saumitra Mishra
Weiyu Liu
Jiajun Wu
270
6
0
18 Nov 2024
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Sagnik Majumder
Tushar Nagarajan
Ziad Al-Halah
Reina Pradhan
Kristen Grauman
417
0
0
13 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length TokenizationNeural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
245
27
0
07 Nov 2024
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
LRMEgoV
366
7
0
04 Nov 2024
Video Token Merging for Long-form Video Understanding
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
290
15
0
31 Oct 2024
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
ProMQA: Question Answering Dataset for Multimodal Procedural Activity UnderstandingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Kimihiro Hasegawa
Wiradee Imrattanatrai
Zhi-Qi Cheng
Masaki Asada
Susan Holm
Yuran Wang
Ken Fukuda
Teruko Mitamura
249
7
0
29 Oct 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningInternational Conference on Learning Representations (ICLR), 2024
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLMVLMAI4TS
286
59
0
25 Oct 2024
Human Action Anticipation: A Survey
Human Action Anticipation: A Survey
Bolin Lai
Sam Toyer
Tushar Nagarajan
Rohit Girdhar
S. Zha
James M. Rehg
Kris Kitani
Kristen Grauman
Ruta Desai
Miao Liu
AI4TS
295
7
0
17 Oct 2024
OMCAT: Omni Context Aware Transformer
OMCAT: Omni Context Aware Transformer
Arushi Goel
Karan Sapra
Matthieu Le
Rafael Valle
Andrew Tao
Bryan Catanzaro
MLLMVLM
228
2
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
When Does Perceptual Alignment Benefit Vision Representations?Neural Information Processing Systems (NeurIPS), 2024
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
275
41
1
14 Oct 2024
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark
  for Fine-grained Motor Behavior Recognition
Human Stone Toolmaking Action Grammar (HSTAG): A Challenging Benchmark for Fine-grained Motor Behavior RecognitionInternational Conference on Data Science and Advanced Analytics (DSAA), 2024
Cheng Liu
Xuyang Yan
Zekun Zhang
Cheng Ding
Tianhao Zhao
Shaya Jannati
Cynthia Martinez
Dietrich Stout
130
1
0
10 Oct 2024
Exploring Efficient Foundational Multi-modal Models for Video
  Summarization
Exploring Efficient Foundational Multi-modal Models for Video Summarization
Karan Samel
Apoorva Beedu
Nitish Sontakke
Irfan Essa
132
2
0
09 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
595
1
0
09 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event ModelingInternational Conference on Learning Representations (ICLR), 2024
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
281
46
0
08 Oct 2024
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts
Yuto Haneji
Taichi Nishimura
Hirotaka Kameko
Keisuke Shirai
Tomoya Yoshida
Keiya Kajimura
Koki Yamamoto
Taiyu Cui
Tomohiro Nishimoto
Shinsuke Mori
EgoV
273
0
0
07 Oct 2024
Optimising for the Unknown: Domain Alignment for Cephalometric Landmark
  Detection
Optimising for the Unknown: Domain Alignment for Cephalometric Landmark Detection
Julian Wyatt
Irina Voiculescu
189
10
0
06 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation LearningInternational Conference on Learning Representations (ICLR), 2024
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
297
7
0
04 Oct 2024
AirLetters: An Open Video Dataset of Characters Drawn in the Air
AirLetters: An Open Video Dataset of Characters Drawn in the Air
Rishit Dagli
Guillaume Berger
Joanna Materzynska
Ingo Bax
Roland Memisevic
VGen
188
1
0
03 Oct 2024
Benchmarking Large Language Models for Conversational Question Answering
  in Multi-instructional Documents
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents
Shiwei Wu
Chen Zhang
Yan Gao
Qimeng Wang
Tong Xu
Yao Hu
Enhong Chen
ELM
244
2
0
01 Oct 2024
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in
  Instructional Videos
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional VideosEuropean Conference on Computer Vision (ECCV), 2024
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Fu-Jen Chu
Kris Kitani
Gedas Bertasius
Xitong Yang
235
10
0
30 Sep 2024
Previous
123456
Next