v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

375

29 May 2025

PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion

199

28 May 2025

Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics

175

28 May 2025

Advancing Video Self-Supervised Learning via Image Foundation ModelsPattern Recognition Letters (Pattern Recogn. Lett.), 2025

Jingwei Wu

Zhewei Huang

Chang Liu

200

25 May 2025

Inference Compute-Optimal Video Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

275

24 May 2025

Multi-task Learning For Joint Action and Gesture Recognition

Konstantinos Spathis

N. Kardaris

Petros Maragos

158

23 May 2025

Leveraging Foundation Models for Multimodal Graph-Based Action Recognition

Fatemeh Ziaeetabar

Florentin Wörgötter

362

21 May 2025

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

454

13 May 2025

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

346

13 May 2025

Video Dataset Condensation with Diffusion Models

Franciskus Xaverius Erick

Bernhard Kainz

DD VGen

501

10 May 2025

Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition

435

09 May 2025

Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments

1.1K

08 May 2025

ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action FlowUSENIX Security Symposium (USENIX Security), 2023

428

02 May 2025

MINERVA: Evaluating Complex Video Reasoning

...

333

01 May 2025

Direct Motion Models for Assessing Generated Videos

...

Sjoerd van Steenkiste

EGVM DiffM VGen

487

30 Apr 2025

A Survey of Interactive Generative Video

427

30 Apr 2025

Learning Streaming Video Representation via Multitask Training

496

28 Apr 2025

Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-ModelsIEEE International Conference on Robotics and Automation (ICRA), 2025

255

17 Apr 2025

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Pritam Sarkar

Ali Etemad

322

16 Apr 2025

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

...

379

14 Apr 2025

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

262

14 Apr 2025

Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities

Maria Santos-Villafranca

348

11 Apr 2025

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

193

10 Apr 2025

SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning

339

08 Apr 2025

A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning

921

08 Apr 2025

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

501

03 Apr 2025

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

665

03 Apr 2025

Is Temporal Prompting All We Need For Limited Labeled Action Recognition?

351

02 Apr 2025

Learning from Streaming Video with Orthogonal GradientsComputer Vision and Pattern Recognition (CVPR), 2025

277

02 Apr 2025

Scaling Language-Free Visual Representation Learning

...

435

01 Apr 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video LearningComputer Vision and Pattern Recognition (CVPR), 2025

338

01 Apr 2025

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled GenerationComputer Vision and Pattern Recognition (CVPR), 2025

...

388

31 Mar 2025

ZeroMimic: Distilling Robotic Manipulation Skills from Web VideosIEEE International Conference on Robotics and Automation (ICRA), 2025

283

31 Mar 2025

R900: Understanding the Cost-Effectiveness of Random Exploration from 900 Hours of Robotic Data Collection

233

30 Mar 2025

CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

524

30 Mar 2025

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired UsersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Antonia Karamolegkou

Malvina Nikandrou

Georgios Pantazopoulos

Danae Sanchez Villegas

229

28 Mar 2025

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action ModelsComputer Vision and Pattern Recognition (CVPR), 2025

...

335

193

27 Mar 2025

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Abdelrahman M. Shaker

916

27 Mar 2025

Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video BenchmarksComputer Vision and Pattern Recognition (CVPR), 2025

262

24 Mar 2025

VTD-CLIP: Video-to-Text Discretization via Prompting CLIP

362

24 Mar 2025

ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset

237

24 Mar 2025

AdaWorld: Learning Adaptable World Models with Latent Actions

549

24 Mar 2025

STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

425

20 Mar 2025

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025

294

20 Mar 2025

Structured-Noise Masked Modeling for Video, Audio and Beyond

310

20 Mar 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

...

545

381

18 Mar 2025

DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D TeachersComputer Vision and Pattern Recognition (CVPR), 2025

Mert Bulent Sariyildiz

336

18 Mar 2025

Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition

326

17 Mar 2025

Efficient Motion-Aware Video MLLMComputer Vision and Pattern Recognition (CVPR), 2025

245

17 Mar 2025

Quantum EigenGame for excited state calculation

David Quiroga

Jason Han

Anastasios Kyrillidis

280

17 Mar 2025