ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.08421
  4. Cited By
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual
  Actions

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

23 May 2017
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
Yeqing Li
Sudheendra Vijayanarasimhan
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
    VGen
ArXivPDFHTML

Papers citing "AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions"

50 / 190 papers shown
Title
Sport Task: Fine Grained Action Detection and Classification of Table
  Tennis Strokes from Videos for MediaEval 2022
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022
Pierre-Etienne Martin
J. Calandre
Boris Mansencal
J. Benois-Pineau
Renaud Péteri
L. Mascarilla
J. Morlier
24
4
0
31 Jan 2023
Similarity Contrastive Estimation for Image and Video Soft Contrastive
  Self-Supervised Learning
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
14
6
0
21 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene
  Segmentation
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
21
2
0
09 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
36
16
0
08 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
46
309
0
06 Dec 2022
DiffusionDet: Diffusion Model for Object Detection
DiffusionDet: Diffusion Model for Object Detection
Shoufa Chen
Pei Sun
Yibing Song
Ping Luo
54
442
0
17 Nov 2022
Token Turing Machines
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
27
21
0
16 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object
  Interactions
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Yu Wu
Cewu Lu
35
7
0
14 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
30
9
0
03 Nov 2022
Holistic Interaction Transformer Network for Action Detection
Holistic Interaction Transformer Network for Action Detection
Gueter Josmy Faure
Min-Hung Chen
S. Lai
33
37
0
23 Oct 2022
MovieCLIP: Visual Scene Recognition in Movies
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Yin Cui
K. Cole-McLaughlin
H. Wang
Shrikanth Narayanan
CLIP
12
20
0
20 Oct 2022
Neighbourhood Representative Sampling for Efficient End-to-end Video
  Quality Assessment
Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment
Haoning Wu
Chaofeng Chen
Liang Liao
Jingwen Hou
Wenxiu Sun
Qiong Yan
Jinwei Gu
Weisi Lin
51
44
0
11 Oct 2022
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
Ahmad Darkhalil
Dandan Shan
Bin Zhu
Jian Ma
Amlan Kar
Richard E. L. Higgins
Sanja Fidler
David Fouhey
Dima Damen
VOS
44
98
0
26 Sep 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models
  In Temporal Video Grounding
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
Erica K. Shimomoto
Edison Marrese-Taylor
Hiroya Takamura
Ichiro Kobayashi
Hideki Nakayama
Yusuke Miyao
27
7
0
26 Sep 2022
BURST: A Benchmark for Unifying Object Recognition, Segmentation and
  Tracking in Video
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
A. Athar
Jonathon Luiten
P. Voigtlaender
Tarasha Khurana
Achal Dave
Bastian Leibe
Deva Ramanan
VOS
VLM
18
57
0
25 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Saeed Mian
ViT
19
44
0
13 Sep 2022
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Mingyuan Zhang
Zhongang Cai
Liang Pan
Fangzhou Hong
Xinying Guo
Lei Yang
Ziwei Liu
DiffM
VGen
34
541
0
31 Aug 2022
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition
  Analysis for Adversarial Multi-task Video Understanding
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding
Stephen Su
Sam Kwong
Qingyu Zhao
De-An Huang
Juan Carlos Niebles
Ehsan Adeli
27
0
0
22 Aug 2022
Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge
  for Human Motion Prediction
Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction
Xiaoning Sun
Qiongjie Cui
Huaijiang Sun
Bin Li
Weiqing Li
Jianfeng Lu
24
7
0
02 Aug 2022
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis
Davide Moltisanti
Jinyi Wu
Bo Dai
Chen Change Loy
DiffM
19
4
0
20 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
30
8
0
08 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
24
52
0
02 Jun 2022
RADNet: A Deep Neural Network Model for Robust Perception in Moving
  Autonomous Systems
RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems
B. Mudassar
Sho Ko
Maojingjing Li
Priyabrata Saha
Saibal Mukhopadhyay
16
2
0
30 Apr 2022
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
Zhongang Cai
Daxuan Ren
Ailing Zeng
Zhengyu Lin
Tao Yu
...
Fangzhou Hong
Mingyuan Zhang
Chen Change Loy
Lei Yang
Ziwei Liu
3DH
33
100
0
28 Apr 2022
Interactiveness Field in Human-Object Interactions
Interactiveness Field in Human-Object Interactions
Xinpeng Liu
Yong-Lu Li
Xiaoqian Wu
Yu-Wing Tai
Cewu Lu
Chi-Keung Tang
35
46
0
16 Apr 2022
3D Convolutional Networks for Action Recognition: Application to Sport
  Gesture Recognition
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
19
5
0
13 Apr 2022
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
  Assessment
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Jinglin Xu
Yongming Rao
Xumin Yu
Guangyi Chen
Jie Zhou
Jiwen Lu
25
88
0
07 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
39
24
0
06 Apr 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding
  Procedural Activities
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Fadime Sener
Dibyadip Chatterjee
Daniel Shelepov
Kun He
Dipika Singhania
Robert Y. Wang
Angela Yao
VGen
19
204
0
28 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
137
1,124
0
23 Mar 2022
Point3D: tracking actions as moving points with 3D CNNs
Point3D: tracking actions as moving points with 3D CNNs
Shentong Mo
Jingfei Xia
Xiaoqing Ellen Tan
Bhiksha Raj
3DPC
20
5
0
20 Mar 2022
Multi-view and Multi-modal Event Detection Utilizing Transformer-based
  Multi-sensor fusion
Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
N. Harada
38
13
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated
  Actions in Vlogs
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
18
3
0
16 Feb 2022
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
HAKE: A Knowledge Engine Foundation for Human Activity Understanding
Yong-Lu Li
Xinpeng Liu
Xiaoqian Wu
Yizhuo Li
Zuoyu Qiu
Liang Xu
Yue Xu
Haoshu Fang
Cewu Lu
32
38
0
14 Feb 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
83
655
0
16 Dec 2021
Temporal Shuffling for Defending Deep Action Recognition Models against
  Adversarial Attacks
Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks
Jaehui Hwang
Huan Zhang
Jun-Ho Choi
Cho-Jui Hsieh
Jong-Seok Lee
AAML
17
5
0
15 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
23
17
0
13 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question
  Answering
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
30
111
0
12 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
48
677
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image
  Classification
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
17
0
0
02 Dec 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
229
1,019
0
13 Oct 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
21
82
0
13 Oct 2021
Rethinking Supervised Pre-training for Better Downstream Transferring
Rethinking Supervised Pre-training for Better Downstream Transferring
Yutong Feng
Jianwen Jiang
Mingqian Tang
R. L. Jin
Yue Gao
SSL
46
39
0
12 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
46
60
0
30 Sep 2021
How much human-like visual experience do current self-supervised
  learning algorithms need in order to achieve human-level object recognition?
How much human-like visual experience do current self-supervised learning algorithms need in order to achieve human-level object recognition?
Emin Orhan
OOD
35
4
0
23 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
Negative Sample Matters: A Renaissance of Metric Learning for Temporal
  Grounding
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
Zhenzhi Wang
Limin Wang
Tao Wu
Tianhao Li
Gangshan Wu
AI4TS
28
116
0
10 Sep 2021
Enhancing Self-supervised Video Representation Learning via Multi-level
  Feature Optimization
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
30
42
0
04 Aug 2021
Spatio-Temporal Context for Action Detection
Spatio-Temporal Context for Action Detection
Manuel Sarmiento Calderó
David Varas
Elisenda Bou
21
2
0
29 Jun 2021
Previous
1234
Next