v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown

VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining

348

16 Mar 2025

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding

510

12 Mar 2025

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

444

10 Mar 2025

VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation

296

09 Mar 2025

Object-Centric World Model for Language-Guided Manipulation

829

08 Mar 2025

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal SynchronizationComputer Vision and Pattern Recognition (CVPR), 2025

418

03 Mar 2025

Streaming Video Question-Answering with In-context Video KV-Cache RetrievalInternational Conference on Learning Representations (ICLR), 2025

209

01 Mar 2025

Learning to Animate Images from A Few Videos to Portray Delicate Human Actions

1.1K

01 Mar 2025

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

662

28 Feb 2025

Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios

409

27 Feb 2025

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

314

27 Feb 2025

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language RecognitionInternational Conference on Learning Representations (ICLR), 2025

295

19 Feb 2025

Magma: A Foundation Model for Multimodal AI AgentsComputer Vision and Pattern Recognition (CVPR), 2025

...

347

18 Feb 2025

Pre-training Auto-regressive Robotic Models with 4D Representations

413

18 Feb 2025

TextOCVP: Object-Centric Video Prediction with Language Guidance

Angel Villar-Corrales

Gjergj Plepi

Sven Behnke

VGen OCL DiffM

524

17 Feb 2025

NeuroStrata: Harnessing Neurosymbolic Paradigms for Improved Design, Testability, and Verifiability of Autonomous CPS

156

17 Feb 2025

Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos

394

14 Feb 2025

Learning Human Skill Generators at Key-Step Levels

390

12 Feb 2025

Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis

Amir Hosein Fadaei

M. Dehaqani

325

11 Feb 2025

A Survey on Mamba Architecture for Vision Applications

431

11 Feb 2025

Can masking background and object reduce static bias for zero-shot action recognition?Conference on Multimedia Modeling (MMM), 2025

449

22 Jan 2025

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysisAccident Analysis and Prevention (Accid Anal Prev), 2025

318

17 Jan 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using SuperquadricsAAAI Conference on Artificial Intelligence (AAAI), 2025

452

13 Jan 2025

Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation LearningIEEE International Conference on Robotics and Automation (ICRA), 2025

306

13 Jan 2025

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

279

06 Jan 2025

GFG -- Gender-Fair Generation: A CALAMITA Challenge

312

31 Dec 2024

Interacted Object Grounding in Spatio-Temporal Human-Object InteractionsAAAI Conference on Artificial Intelligence (AAAI), 2024

444

27 Dec 2024

Sensitive Image Classification by Vision TransformersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024

319

21 Dec 2024

Predictive Inverse Dynamics Models are Scalable Learners for Robotic ManipulationInternational Conference on Learning Representations (ICLR), 2024

365

19 Dec 2024

Scaling 4D Representations

...

431

19 Dec 2024

Do Language Models Understand Time?The Web Conference (WWW), 2024

Xi Ding

Lei Wang

912

18 Dec 2024

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

...

553

18 Dec 2024

HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction

287

17 Dec 2024

InterDyn: Controllable Interactive Dynamics with Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024

Victoria Fernandez-Abrevaya

VGen AI4CE

630

16 Dec 2024

Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

283

15 Dec 2024

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-SequenceAAAI Conference on Artificial Intelligence (AAAI), 2024

525

10 Dec 2024

SEAL: Semantic Attention Learning for Long Video RepresentationComputer Vision and Pattern Recognition (CVPR), 2024

622

02 Dec 2024

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal AugmentationComputer Vision and Pattern Recognition (CVPR), 2024

912

01 Dec 2024

TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2024

506

28 Nov 2024

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

350

25 Nov 2024

Extending Video Masked Autoencoders to 128 framesNeural Information Processing Systems (NeurIPS), 2024

...

314

20 Nov 2024

Principles of Visual Tokens for Efficient Video Understanding

480

20 Nov 2024

Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition

414

18 Nov 2024

Efficient Transfer Learning for Video-language Foundation ModelsComputer Vision and Pattern Recognition (CVPR), 2024

388

18 Nov 2024

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel LevelComputer Vision and Pattern Recognition (CVPR), 2024

391

15 Nov 2024

ClevrSkills: Compositional Language and Visual Reasoning in RoboticsNeural Information Processing Systems (NeurIPS), 2024

Sanjay Haresh

Daniel Dijkman

Apratim Bhattacharyya

Roland Memisevic

CoGe LRM

237

13 Nov 2024

Balancing Multimodal Training Through Game-Theoretic Regularization

Konstantinos Kontras

Thomas Strypsteen

Christos Chatzichristos

Paul P. Liang

Matthew Blaschko

M. D. Vos

396

11 Nov 2024

Don't Look Twice: Faster Video Transformers with Run-Length TokenizationNeural Information Processing Systems (NeurIPS), 2024

245

07 Nov 2024

HourVideo: 1-Hour Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024

Keshigeyan Chandrasegaran

285

07 Nov 2024

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

234

04 Nov 2024

All Papers

The "something something" video database for learning and evaluating visual common sense

Papers citing "The "something something" video database for learning and evaluating visual common sense"