Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.03150
Cited By
Moments in Time Dataset: one million videos for event understanding
9 January 2018
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
Tom Yan
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Moments in Time Dataset: one million videos for event understanding"
50 / 268 papers shown
Title
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
19
66
0
09 Jun 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViT
AI4TS
31
19
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
59
1,255
0
04 May 2022
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
Zhongang Cai
Daxuan Ren
Ailing Zeng
Zhengyu Lin
Tao Yu
...
Fangzhou Hong
Mingyuan Zhang
Chen Change Loy
Lei Yang
Ziwei Liu
3DH
28
100
0
28 Apr 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
22
0
0
25 Apr 2022
Performance Evaluation of Action Recognition Models on Low Quality Videos
Aoi Otani
Ryota Hashiguchi
Kazuki Omi
Norishige Fukushima
Toru Tamaki
11
6
0
19 Apr 2022
Video Action Detection: Analysing Limitations and Challenges
Rajat Modi
A. J. Rana
Akash Kumar
Praveen Tirupattur
Shruti Vyas
Y. S. Rawat
M. Shah
9
12
0
17 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
25
88
0
07 Apr 2022
Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
Minghao Chen
Fangyun Wei
Chong Li
Deng Cai
AI4TS
26
33
0
28 Mar 2022
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
Vikram Gupta
Trisha Mittal
Puneet Mathur
Vaibhav Mishra
Mayank Maheshwari
Aniket Bera
Debdoot Mukherjee
Dinesh Manocha
VGen
15
11
0
28 Mar 2022
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Dinesh Manocha
21
13
0
21 Mar 2022
Gate-Shift-Fuse for Video Action Recognition
Swathikiran Sudhakaran
Sergio Escalera
O. Lanz
20
22
0
16 Mar 2022
Source-free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition
Yuecong Xu
Jianfei Yang
Haozhi Cao
Keyu Wu
Min-man Wu
Zhenghua Chen
TTA
21
31
0
09 Mar 2022
Universal Prototype Transport for Zero-Shot Action Recognition and Localization
Pascal Mettes
14
5
0
08 Mar 2022
Self-supervised Social Relation Representation for Human Group Detection
Jiacheng Li
Ruize Han
Haomin Yan
Zekun Qian
Wei Feng
Song Wang
6
5
0
08 Mar 2022
Didn't see that coming: a survey on non-verbal social human behavior forecasting
Germán Barquero
Johnny Núnez
Sergio Escalera
Zhen Xu
Wei-Wei Tu
Isabelle M Guyon
Cristina Palmero
AI4TS
26
21
0
04 Mar 2022
Continuous Human Action Recognition for Human-Machine Interaction: A Review
Harshala Gammulle
David Ahmedt-Aristizabal
Simon Denman
Lachlan Tychsen-Smith
L. Petersson
Clinton Fookes
35
25
0
26 Feb 2022
Discovering Multiple and Diverse Directions for Cognitive Image Properties
Umut Kocasari
Alperen Bag
Oğuz Kaan Yüksel
Pinar Yanardag
DiffM
16
0
0
23 Feb 2022
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
Yuecong Xu
Jianfei Yang
Haozhi Cao
Jianxiong Yin
Zhenghua Chen
Xiaoli Li
Zhengguo Li
Qiaoqiao Xu
35
2
0
19 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
Multi-level Second-order Few-shot Learning
Hongguang Zhang
Hongdong Li
Piotr Koniusz
11
33
0
15 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
24
211
0
12 Jan 2022
Boosting Video Representation Learning with Multi-Faceted Integration
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Xiaoping Zhang
Dong Wu
Tao Mei
28
8
0
11 Jan 2022
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
Hallucinating Pose-Compatible Scenes
Tim Brooks
Alexei A. Efros
3DH
47
14
0
13 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
21
17
0
13 Dec 2021
MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification
Rex Liu
Huan Zhang
Hamed Pirsiavash
Xin Liu
ViT
11
11
0
08 Dec 2021
Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning
Wenjie Shi
Gao Huang
Shiji Song
Cheng Wu
29
9
0
06 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
45
129
0
02 Dec 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
16
292
0
24 Nov 2021
Unsupervised Action Localization Crop in Video Retargeting for 3D ConvNets
Prithwish Jana
Swarnabja Bhaumik
Partha Pratim Mohanta
20
3
0
14 Nov 2021
Revisiting spatio-temporal layouts for compositional action recognition
Gorjan Radevski
Marie-Francine Moens
Tinne Tuytelaars
30
26
0
02 Nov 2021
Zero-Shot Action Recognition from Diverse Object-Scene Compositions
Carlo Bretti
Pascal Mettes
OCL
11
9
0
26 Oct 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer
Andrew Rouditchenko
Andrei Barbu
Boris Katz
James R. Glass
11
4
0
14 Oct 2021
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels
Mohit Sharma
Rajkumar Patra
Harshali Desai
Shruti Vyas
Y. S. Rawat
R. Shah
VGen
NoLa
16
3
0
13 Oct 2021
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
Shuang Li
Yilun Du
Antonio Torralba
Josef Sivic
Bryan C. Russell
51
15
0
07 Oct 2021
Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment
Yuecong Xu
Jianfei Yang
Haozhi Cao
Keyu Wu
Min-man Wu
Rui Zhao
Zhenghua Chen
TTA
22
22
0
21 Sep 2021
Impact of GPU uncertainty on the training of predictive deep neural networks
Maciej Pietrowski
A. Gajda
Takuto Yamamoto
Taisuke Kobayashi
Lana Sinapayen
Eiji Watanabe
BDL
19
0
0
03 Sep 2021
Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
James Hong
Matthew Fisher
Michael Gharbi
Kayvon Fatahalian
3DH
25
37
0
03 Sep 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
33
27
0
26 Aug 2021
Dynamic Network Quantization for Efficient Video Inference
Ximeng Sun
Rameswar Panda
Chun-Fu Chen
A. Oliva
Rogerio Feris
Kate Saenko
29
45
0
23 Aug 2021
TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos
Praveen Tirupattur
A. J. Rana
Tushar Sangam
Shruti Vyas
Y. S. Rawat
M. Shah
6
6
0
24 Jul 2021
iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding and Emotion Analysis
Xin Liu
Henglin Shi
Haoyu Chen
Zitong Yu
Xiaobai Li
Guoying Zhao
19
80
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
541
0
30 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Rameswar Panda
ViT
29
33
0
26 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
21
127
0
21 Jun 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
19
45
0
07 Jun 2021
APES: Audiovisual Person Search in Untrimmed Video
Juan Carlos León Alcázar
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbeláez
Bernard Ghanem
Fabian Caba Heilbron
23
6
0
03 Jun 2021
Patch Tracking-based Streaming Tensor Ring Completion for Visual Data Recovery
Yicong He
George K. Atia
19
9
0
30 May 2021
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
Yi Liu
Limin Wang
Yali Wang
Xiao Ma
Yu Qiao
17
56
0
24 May 2021
Previous
1
2
3
4
5
6
Next