v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown

Theia: Distilling Diverse Vision Foundation Models for Robot LearningConference on Robot Learning (CoRL), 2024

272

29 Jul 2024

Mixture of Nested Experts: Adaptive Processing of Visual TokensNeural Information Processing Systems (NeurIPS), 2024

259

29 Jul 2024

Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

241

25 Jul 2024

MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos

Zsófia Katona

Seyed Sahand Mohamadi Ziabari

Fatemeh Karimi Nejadasl

277

25 Jul 2024

HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data

Emin Orhan

VLM SyDa

194

25 Jul 2024

SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action RecognitionACM Multimedia (MM), 2024

269

23 Jul 2024

SIGMA:Sinkhorn-Guided Masked Video Modeling

249

22 Jul 2024

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

...

Ji Zhang

Fei Huang

Chunfen Yuan

Bing Li

Weiming Hu

VLM

151

21 Jul 2024

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

216

20 Jul 2024

A Comprehensive Review of Few-shot Action Recognition

534

20 Jul 2024

Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition

395

19 Jul 2024

Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

276

19 Jul 2024

VideoMamba: Spatio-Temporal Selective State Space Model

Hee-Seon Kim

Changick Kim

289

11 Jul 2024

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Li Zhao

331

10 Jul 2024

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

Rui Qian

Shuangrui Ding

Dahua Lin

OCL

229

09 Jul 2024

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

403

08 Jul 2024

DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain Few-shot Action Recognition

370

08 Jul 2024

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

346

03 Jul 2024

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

242

03 Jul 2024

Tarsier: Recipes for Training and Evaluating Large Video Description Models

Jiawei Wang

Liping Yuan

Yuchen Zhang

303

112

30 Jun 2024

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

Hao Fei

Meishan Zhang

267

27 Jun 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

...

Yali Wang

Tong Lu

Limin Wang

Yu Qiao

EgoV

485

26 Jun 2024

Towards Event-oriented Long Video Understanding

Kun Zhou

Wayne Xin Zhao

Bingning Wang

Weipeng Chen

Ji-Rong Wen

VLM

201

20 Jun 2024

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

426

20 Jun 2024

Exploring the Impact of Hand Pose and Shadow on Hand-washing Action Recognition

Shengtai Ju

A. Reibman

CVBM

128

19 Jun 2024

Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space

Yuan Wang

Zhao Wang

Junhao Gong

Di Huang

Tong He

...

Xuetao Feng

204

17 Jun 2024

HumanPlus: Humanoid Shadowing and Imitation from HumansConference on Robot Learning (CoRL), 2024

337

205

15 Jun 2024

Long Story Short: Story-level Video Understanding from 20K Short Films

Xi Wang

189

14 Jun 2024

A Survey of Video Datasets for Grounded Event Understanding

Kate Sanders

Benjamin Van Durme

224

14 Jun 2024

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Salman Khan

259

102

13 Jun 2024

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

Junke Wang

Yu-Gang Jiang

307

13 Jun 2024

Comparison Visual Instruction Tuning

Wei Lin

270

13 Jun 2024

Cognitively Inspired Energy-Based World Models

216

13 Jun 2024

Pandora: Towards General World Model with Natural Language Actions and Video States

Guangyi Liu

...

Zhengzhong Liu

Eric P. Xing

Zhiting Hu

VGen

299

12 Jun 2024

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Yu-Gang Jiang

288

10 Jun 2024

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

570

09 Jun 2024

FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Mona Ahmadian

Frank Guerin

Andrew Gilbert

329

05 Jun 2024

Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models

Georgia Markham

M. Balamurali

Andrew J. Hill

385

03 Jun 2024

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

331

02 Jun 2024

Learning Manipulation by Predicting Interaction

Li Chen

...

Heming Cui

Bin Zhao

Xuelong Li

Yu Qiao

Hongyang Li

374

01 Jun 2024

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

327

30 May 2024

MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning

267

29 May 2024

A Survey of Multimodal Large Language Model from A Data-centric Perspective

...

Conghui He

363

26 May 2024

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Dong Li

283

24 May 2024

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

Cihang Xie

203

24 May 2024

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Ting Liu

Xuyang Liu

Liangtao Shi

Zunnan Xu

Yue Hu

Yi Xin

Quanjun Yin

Bineng Zhong

Donglin Wang

247

23 May 2024

A Survey on Vision-Language-Action Models for Embodied AI

880

164

23 May 2024

BIMM: Brain Inspired Masked Modeling for Video Representation Learning

237

21 May 2024

Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

426

21 May 2024

From Sora What We Can See: A Survey of Text-to-Video Generation

267

17 May 2024