v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,014 papers shown

MOFO: MOtion FOcused Self-Supervision for Video Understanding

Mona Ahmadian

Frank Guerin

Andrew Gilbert

307

23 Aug 2023

Opening the Vocabulary of Egocentric ActionsNeural Information Processing Systems (NeurIPS), 2023

Angela Yao

310

22 Aug 2023

Are current long-term video understanding datasets long-term?

Ombretta Strafforello

Klamer Schutte

Jan van Gemert

207

22 Aug 2023

MGMAE: Motion Guided Masking for Video Masked AutoencodingIEEE International Conference on Computer Vision (ICCV), 2023

Yu Qiao

155

21 Aug 2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid MatchingIEEE International Conference on Computer Vision (ICCV), 2023

Mengmeng Wang

Jingdong Wang

208

18 Aug 2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023

K. Mangalam

Raiymbek Akshulakov

Jitendra Malik

409

498

17 Aug 2023

SRMAE: Masked Image Modeling for Scale-Invariant Deep RepresentationsChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023

Zhiming Wang

Lin Gu

Feng Lu

239

17 Aug 2023

On the Importance of Spatial Relations for Few-shot Action RecognitionACM Multimedia (ACM MM), 2023

Zuxuan Wu

255

14 Aug 2023

Temporally-Adaptive Models for Efficient Video Understanding

Ziwei Liu

205

10 Aug 2023

Prune Spatio-temporal Tokens by Semantic-aware Temporal AccumulationIEEE International Conference on Computer Vision (ICCV), 2023

207

08 Aug 2023

M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition

^3

Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action RecognitionACM Multimedia (ACM MM), 2023

Hao Tang

Jun Liu

Shuanglin Yan

Rui Yan

Zechao Li

Jinhui Tang

281

06 Aug 2023

A Survey on Deep Learning-based Spatio-temporal Action Detection

Peng Wang

Fanwei Zeng

Yu Qian

224

03 Aug 2023

Multimodal Adaptation of CLIP for Few-Shot Action RecognitionPattern Recognition (Pattern Recogn.), 2023

Mengmeng Wang

Jingdong Wang

181

03 Aug 2023

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Ying Shan

480

789

30 Jul 2023

Scaling Up and Distilling Down: Language-Guided Robot Skill AcquisitionConference on Robot Learning (CoRL), 2023

Huy Ha

Peter R. Florence

Shuran Song

LM&Ro

274

210

26 Jul 2023

Group Activity Recognition in Computer Vision: A Comprehensive Review, Challenges, and Future Perspectives

C. Wang

A. Mohamed

258

25 Jul 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling?IEEE International Conference on Computer Vision (ICCV), 2023

Jingdong Wang

Wanli Ouyang

212

18 Jul 2023

Multimodal Distillation for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023

Gorjan Radevski

Dusan Grujicic

Marie-Francine Moens

Matthew Blaschko

Tinne Tuytelaars

EgoV

335

14 Jul 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023

Syed Talal Wasim

Muhammad Uzair Khattak

Salman Khan

257

13 Jul 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2023

Yi Wang

...

Ping Luo

Ziwei Liu

Yali Wang

Limin Wang

Yu Qiao

VLM VGen

367

407

13 Jul 2023

Free-Form Composition Networks for Egocentric Action Recognition

Yibing Zhan

Liang Ding

321

13 Jul 2023

Reading Between the Lanes: Text VideoQA on the RoadIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

277

08 Jul 2023

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and DecisionIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

398

07 Jul 2023

VideoGLUE: Video General Understanding Evaluation of Foundation Models

...

273

06 Jul 2023

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating

194

06 Jul 2023

What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?North American Chapter of the Association for Computational Linguistics (NAACL), 2023

321

05 Jul 2023

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Yuqin Zhu

Yichen Zhu

ViT

233

05 Jul 2023

Task-Specific Alignment and Multiple Level Transformer for Few-Shot Action RecognitionNeurocomputing (Neurocomputing), 2023

235

05 Jul 2023

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to ControlConference on Robot Learning (CoRL), 2023

Philippe Hansen-Estruch

419

30 Jun 2023

Look, Remember and Reason: Grounded reasoning in videos with language modelsInternational Conference on Learning Representations (ICLR), 2023

Apratim Bhattacharyya

470

30 Jun 2023

How can objects help action recognition?Computer Vision and Pattern Recognition (CVPR), 2023

233

20 Jun 2023

Dynamic Perceiver for Efficient Visual RecognitionIEEE International Conference on Computer Vision (ICCV), 2023

Yulin Wang

Gao Huang

296

20 Jun 2023

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine VisionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

281

19 Jun 2023

Robot Learning with Sensorimotor Pre-trainingConference on Robot Learning (CoRL), 2023

Letian Fu

273

16 Jun 2023

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

258

15 Jun 2023

A Large-Scale Analysis on Self-Supervised Video Representation Learning

316

09 Jun 2023

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal AlignmentNeural Information Processing Systems (NeurIPS), 2023

Zihui Xue

Kristen Grauman

EgoV

285

08 Jun 2023

Optimizing ViViT Training: Time and Memory Reduction for Action Recognition

182

07 Jun 2023

^3

IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

Lei Li

Yuwei Yin

Shicheng Li

Liang Chen

Peiyi Wang

...

Yazheng Yang

Jingjing Xu

Xu Sun

Lingpeng Kong

Qi Liu

MLLM VLM

382

135

07 Jun 2023

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

Hao Chen

194

04 Jun 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023

...

Christoph Feichtenhofer

3DH

305

304

01 Jun 2023

LIV: Language-Image Representations and Rewards for Robotic ControlInternational Conference on Machine Learning (ICML), 2023

Vikash Kumar

242

182

01 Jun 2023

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning

288

01 Jun 2023

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

297

29 May 2023

Visual Affordance Prediction for Guiding Robot ExplorationIEEE International Conference on Robotics and Automation (ICRA), 2023

Homanga Bharadhwaj

Abhi Gupta

Shubham Tulsiani

254

28 May 2023

Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveNeurocomputing (Neurocomputing), 2023

Thanh-Dat Truong

Khoa Luu

EgoV

389

25 May 2023

Deep Neural Networks in Video Human Action Recognition: A Review

255

25 May 2023

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Ying Shan

282

23 May 2023

Paxion: Patching Action Knowledge in Video-Language Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023

Heng Ji

253

18 May 2023

Motion-Scenario Decoupling for Rat-Aware Video Position Prediction: Strategy and BenchmarkInternational Conference on Image and Graphics (ICIG), 2023

208

17 May 2023