Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.03150
Cited By
Moments in Time Dataset: one million videos for event understanding
9 January 2018
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
Tom Yan
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Moments in Time Dataset: one million videos for event understanding"
50 / 268 papers shown
Title
Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating
Shengyuan Liu
Yuanyuan Ding
Guihong Lao
Sihan Zhang
Ning Zhou
Wen-Yue Chen
Hao Liu
19
2
0
06 Jul 2023
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
33
7
0
30 Jun 2023
FedMultimodal: A Benchmark For Multimodal Federated Learning
Tiantian Feng
Digbalay Bose
Tuo Zhang
Rajat Hebbar
Anil Ramakrishna
Rahul Gupta
Mi Zhang
Salman Avestimehr
Shrikanth Narayanan
32
48
0
15 Jun 2023
What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations
Chiara Plizzari
Toby Perrett
Barbara Caputo
Dima Damen
EgoV
13
16
0
14 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
18
4
0
07 Jun 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
51
4
0
25 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Mohit Bansal
Heng Ji
KELM
VGen
17
26
0
18 May 2023
Is end-to-end learning enough for fitness activity recognition?
Antoine Mercier
Guillaume Berger
Sunny Panchal
Florian Letsch
Cornelius Boehm
Nahua Kang
Ingo Bax
Roland Memisevic
21
2
0
14 May 2023
Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization
Xijun Wang
Aggelos K. Katsaggelos
34
0
0
07 May 2023
Learning Human-Human Interactions in Images from Weak Textual Supervision
Morris Alper
Hadar Averbuch-Elor
VLM
37
2
0
27 Apr 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
27
20
0
22 Apr 2023
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
Ruiqi Xian
Xijun Wang
D. Kothandaraman
Dinesh Manocha
21
7
0
14 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
30
70
0
13 Apr 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
41
154
0
28 Mar 2023
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation
Xiyu Wang
Yuecong Xu
Jianfei Yang
Xiaoli Li
Zhenghua Chen
TTA
24
0
0
18 Mar 2023
Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
Yuecong Xu
Jianfei Yang
Yunjiao Zhou
Zhenghua Chen
Min-man Wu
Xiaoli Li
27
5
0
18 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Tianyi Zhou
Ming Lin
Dinesh Manocha
24
5
0
15 Mar 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Rameswar Panda
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
100
38
0
15 Mar 2023
MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition
Ruiqi Xian
Xijun Wang
Dinesh Manocha
19
10
0
05 Mar 2023
Evidence-empowered Transfer Learning for Alzheimer's Disease
Kai Tzu-iunn Ong
Hana Kim
Minjin Kim
Jinseong Jang
B. Sohn
Y. Choi
D. Hwang
Seong Jae Hwang
Jinyoung Yeo
MedIm
17
5
0
02 Mar 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
61
569
0
10 Feb 2023
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms
Pierre-Etienne Martin
14
1
0
06 Feb 2023
A deep local attention network for pre-operative lymph node metastasis prediction in pancreatic cancer via multiphase CT imaging
Zhilin Zheng
Xu Fang
Jiawen Yao
Mengmeng Zhu
Le Lu
...
Hong Lu
Jian-Ping Lu
Ling Zhang
C. Shao
Yun Bian
MedIm
11
1
0
04 Jan 2023
Source-Free Unsupervised Domain Adaptation: A Survey
Yuqi Fang
P. Yap
W. Lin
Hongtu Zhu
Mingxia Liu
130
89
0
31 Dec 2022
Self-supervised and Weakly Supervised Contrastive Learning for Frame-wise Action Representations
Minghao Chen
Renbo Tu
Chenxi Huang
Yuqi Lin
Boxi Wu
Deng Cai
SSL
AI4TS
24
1
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
37
0
0
03 Dec 2022
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang
Zong-Yao Wu
Bowei Jiang
Zhimin Bao
Lin Zhu
Guoqiu Li
Yaowei Wang
Yonghong Tian
18
36
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
25
106
0
17 Nov 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
Temporal Action Segmentation: An Analysis of Modern Techniques
Guodong Ding
Fadime Sener
Angela Yao
35
74
0
19 Oct 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika
Eric Tang
Andy Zou
Steven Basart
Jun Shern Chan
Dawn Song
David A. Forsyth
Jacob Steinhardt
Dan Hendrycks
26
8
0
18 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
22
38
0
12 Oct 2022
Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation
Wenjing Wang
Zheng Xu
Haofeng Huang
Jiaying Liu
24
17
0
07 Oct 2022
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
43
371
0
05 Oct 2022
Multi-dataset Training of Transformers for Robust Action Recognition
Junwei Liang
Enwei Zhang
Jun Zhang
Chunhua Shen
ViT
37
11
0
26 Sep 2022
FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification
P. Jin
Lichao Mou
Yuansheng Hua
Gui-Song Xia
Xiao Xiang Zhu
AI4TS
10
8
0
22 Sep 2022
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition
D. Kothandaraman
Ming-Shun Lin
Dinesh Manocha
25
6
0
15 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
44
0
13 Sep 2022
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Mingyuan Zhang
Zhongang Cai
Liang Pan
Fangzhou Hong
Xinying Guo
Lei Yang
Ziwei Liu
DiffM
VGen
24
538
0
31 Aug 2022
UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV
Mahieyin Rahmun
Tonmoay Deb
Shahriar Ali Bijoy
M. Raha
11
0
0
13 Aug 2022
Seeing your sleep stage: cross-modal distillation from EEG to infrared video
Jianan Han
Shenmin Zhang
Aidong Men
Yang Liu
Z. Yao
Yan-Tao Yan
Qingchao Chen
17
4
0
11 Aug 2022
Leveraging Endo- and Exo-Temporal Regularization for Black-box Video Domain Adaptation
Yuecong Xu
Jianfei Yang
Haozhi Cao
Min-man Wu
Xiaoli Li
Lihua Xie
Zhenghua Chen
36
4
0
10 Aug 2022
Inflating 2D Convolution Weights for Efficient Generation of 3D Medical Images
Yanbin Liu
Girish Dwivedi
F. Boussaïd
Frank M. Sanfilippo
Makoto Yamada
Bennamoun
DiffM
MedIm
11
9
0
08 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge J. Belongie
19
10
0
21 Jul 2022
Fine-grained Activities of People Worldwide
J. Byrne
Greg Castañón
Zhongheng Li
G. Ettinger
11
3
0
11 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
25
8
0
08 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Y. S. Rawat
AAML
24
24
0
04 Jul 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
28
6
0
21 Jun 2022
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
ViT
11
19
0
20 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
26
131
0
18 Jun 2022
Previous
1
2
3
4
5
6
Next