Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.07925
Cited By
ActionFormer: Localizing Moments of Actions with Transformers
16 February 2022
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ActionFormer: Localizing Moments of Actions with Transformers"
47 / 47 papers shown
Title
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation
Edoardo Bianchi
Antonio Liotta
14
0
0
13 May 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
36
0
0
07 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
62
0
0
06 May 2025
Empowering Agentic Video Analytics Systems with Video Language Models
Yuxuan Yan
Shiqi Jiang
Ting Cao
Y. Yang
Qianqian Yang
Yuanchao Shu
Y. Yang
Lili Qiu
VLM
67
0
0
01 May 2025
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
Rezowan Shuvo
M S Mekala
Eyad Elyan
MedIm
54
0
0
26 Apr 2025
Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization
Hongwei Ji
Wulian Yun
Mengshi Qi
Huadong Ma
LRM
72
0
0
18 Apr 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
31
0
0
17 Apr 2025
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhé Hóu
Yun Lin
J. Dong
35
0
0
11 Apr 2025
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang
Yuan-Ming Li
Zhi-Wei Xia
Yu-Ming Tang
Kun-Yu Lin
Jian-Fang Hu
Wei-Shi Zheng
41
0
0
28 Mar 2025
End-to-End Action Segmentation Transformer
Tieqiao Wang
Sinisa Todorovic
ViT
37
0
0
08 Mar 2025
SurgPLAN++: Universal Surgical Phase Localization Network for Online and Offline Inference
Zhen Chen
Xingjian Luo
Jinlin Wu
Long Bai
Zhen Lei
Hongliang Ren
Sebastien Ourselin
Hongbin Liu
56
0
0
17 Feb 2025
Do Language Models Understand Time?
Xi Ding
Lei Wang
167
0
0
18 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
109
1
0
12 Dec 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
58
2
0
15 Nov 2024
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
57
1
0
18 Oct 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
45
0
0
12 Sep 2024
Introducing Gating and Context into Temporal Action Detection
Aglind Reka
Diana Laura Borza
Dominick Reilly
Michal Balazia
Francois Bremond
15
0
0
06 Sep 2024
Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment
Keyne Oei
Amr Gomaa
Anna Maria Feit
João Belo
26
0
0
06 Sep 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
35
7
0
07 Jul 2024
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
F. Khan
Graham W. Taylor
29
3
0
21 Jun 2024
MALT: Multi-scale Action Learning Transformer for Online Action Detection
Zhipeng Yang
Ruoyu Wang
Yang Tan
Liping Xie
OffRL
38
0
0
31 May 2024
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng
Xiaoyong Chen
Jiaming Liang
Huisi Wu
Guangzhong Cao
Yong Guo
AAML
32
3
0
29 Mar 2024
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann
Suman Ghosh
Ignacio Juarez Martinez
Tom Hart
Alex Kacelnik
Guillermo Gallego
17
7
0
06 Dec 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
24
25
0
28 Nov 2023
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Zhenying Fang
Jun Yu
Richang Hong
11
0
0
10 Oct 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
19
53
0
21 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
29
0
21 Aug 2023
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
Lin Sui
Fangzhou Mu
Yin Li
20
2
0
05 Jul 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
51
4
0
25 May 2023
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
Pilhyeon Lee
Taeoh Kim
Minho Shim
Dongyoon Wee
H. Byun
16
11
0
30 Mar 2023
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
Sauradip Nag
Xiatian Zhu
Jiankang Deng
Yi-Zhe Song
Tao Xiang
DiffM
VGen
25
21
0
27 Mar 2023
Multi-modal Prompting for Low-Shot Temporal Action Localization
Chen Ju
Zeqian Li
Peisen Zhao
Ya-Qin Zhang
Xiaopeng Zhang
Qi Tian
Yanfeng Wang
Weidi Xie
22
18
0
21 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
8
7
0
16 Feb 2023
Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions
Pratik K. Mishra
Alex Mihailidis
Shehroz S. Khan
23
17
0
31 Dec 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Bernard Ghanem
16
17
0
25 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
18
106
0
17 Nov 2022
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
Fangzhou Mu
Sicheng Mo
Gillian Wang
Yin Li
10
3
0
16 Nov 2022
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022
Yin-Dong Zheng
Guo Chen
Jiahao Wang
Tong Lu
Liming Wang
26
0
0
16 Nov 2022
Prior-enhanced Temporal Action Localization using Subject-aware Spatial Attention
Yifan Liu
Youbao Tang
Ning Zhang
Ruei-Sung Lin
Haoqian Wang
23
0
0
10 Nov 2022
One-stage Action Detection Transformer
Lijun Li
Lian Zhuo
Bangyin Zhang
ViT
8
0
0
21 Jun 2022
ETAD: Training Action Detection End to End on a Laptop
Shuming Liu
Mengmeng Xu
Chen Zhao
Xu Zhao
Bernard Ghanem
44
6
0
14 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
27
45
0
05 May 2022
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,604
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
73
178
0
03 Feb 2021
Gaussian Temporal Awareness Networks for Action Localization
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
125
319
0
09 Sep 2019
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
135
691
0
08 Jun 2018
1