v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown

Natural Language Can Help Bridge the Sim2Real Gap

Albert Yu

Adeline Foote

Raymond J. Mooney

Roberto Martín-Martín

LM&Ro

396

16 May 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via SimulationComputer Vision and Pattern Recognition (CVPR), 2024

...

Miao Liu

Pengchuan Zhang

Ruohan Zhang

Fei-Fei Li

Jiajun Wu

VGen

182

15 May 2024

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

223

14 May 2024

Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for ControlNeural Information Processing Systems (NeurIPS), 2024

276

09 May 2024

A Survey on Backbones for Deep Video Action Recognition

172

09 May 2024

Sora and V-JEPA Have Not Learned The Complete Real World Model -- A Philosophical Analysis of Video AIs Through the Theory of Productive Imagination

Jianqiu Zhang

VGen

100

06 May 2024

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

Muhammad Uzair Khattak

Muhammad Ferjad Naeem

Jameel Hassan

Muzammal Naseer

Federico Tombari

Fahad Shahbaz Khan

Salman Khan

LRM ELM

281

06 May 2024

MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action RecognitionIEEE transactions on multimedia (IEEE TMM), 2024

Rui Yan

447

03 May 2024

Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot ManipulationEuropean Conference on Computer Vision (ECCV), 2024

Homanga Bharadhwaj

229

02 May 2024

Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Hoang-Quan Nguyen

Thanh-Dat Truong

Khoa Luu

287

02 May 2024

WorldGPT: Empowering LLM as Multimodal World Model

314

28 Apr 2024

VIEW: Visual Imitation Learning with Waypoints

550

27 Apr 2024

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

See Kiong Ng

272

276

25 Apr 2024

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Badri N. Patro

Vijay Srinivas Agneeswaran

Mamba

359

24 Apr 2024

Rank2Reward: Learning Shaped Reward Functions from Passive Video

Dima Damen

Abhishek Gupta

229

23 Apr 2024

1st Place Solution to the 1st SkatingVerse Challenge

22 Apr 2024

On the Content Bias in Fréchet Video Distance

255

18 Apr 2024

Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition

433

18 Apr 2024

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Jathushan Rajasegaran

267

15 Apr 2024

Leveraging Temporal Contextualization for Video Action Recognition

638

15 Apr 2024

T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos

258

08 Apr 2024

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Tao Wu

Runyu He

Gangshan Wu

Limin Wang

3DH

303

06 Apr 2024

Visual Knowledge in the Big Model Era: Retrospect and Prospect

313

05 Apr 2024

Learning Correlation Structures for Vision Transformers

297

05 Apr 2024

ASTRA: An Action Spotting TRAnsformer for Soccer Videos

347

02 Apr 2024

SUGAR: Pre-training 3D Visual Representations for RoboticsComputer Vision and Pattern Recognition (CVPR), 2024

258

01 Apr 2024

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

...

Alexander G. Hauptmann

Yonatan Bisk

Yiming Yang

MLLM

377

119

01 Apr 2024

ST-LLM: Large Language Models Are Effective Temporal Learners

Ying Shan

193

123

30 Mar 2024

OmniVid: A Generative Framework for Universal Video Understanding

Lu Yuan

Zuxuan Wu

Yu-Gang Jiang

VLM VGen

285

26 Mar 2024

Enhancing Video Transformers for Action Understanding with VLM-aided Training

217

24 Mar 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2024

...

Yifei Huang

Yu Qiao

Yali Wang

Limin Wang

260

104

22 Mar 2024

VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024

Mamshad Nayeem Rizve

Fan Fei

Jayakrishnan Unnikrishnan

Mubarak Shah

224

21 Mar 2024

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

793

693

21 Mar 2024

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

286

20 Mar 2024

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

155

19 Mar 2024

VideoBadminton: A Video Dataset for Badminton Action Recognition

181

19 Mar 2024

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationNeural Information Processing Systems (NeurIPS), 2024

Gao Huang

Yang You

320

18 Mar 2024

Don't Judge by the Look: Towards Motion Coherent Video RepresentationInternational Conference on Learning Representations (ICLR), 2024

Huan Wang

258

14 Mar 2024

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Ruohan Zhang

...

Silvio Savarese

Jiajun Wu

197

14 Mar 2024

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked AutoencodersComputer Vision and Pattern Recognition (CVPR), 2024

265

13 Mar 2024

Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling

W. G. C. Bandara

Vishal M. Patel

VPVLM VLM

251

11 Mar 2024

VideoMamba: State Space Model for Efficient Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2024

Yu Qiao

277

385

11 Mar 2024

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View WorldACM Multimedia (ACM MM), 2023

Boshen Xu

Sipeng Zheng

Qin Jin

189

09 Mar 2024

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Joseph Cho

Fachrina Dewi Puspitasari

Lik-Hang Lee

274

08 Mar 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

Yu Qiao

234

29 Feb 2024

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

...

279

28 Feb 2024

Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

445

24 Feb 2024

Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

297

20 Feb 2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

...

386

20 Feb 2024

Revisiting Feature Prediction for Learning Visual Representations from Video

345

173

15 Feb 2024