v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,012 papers shown

Parse-Augment-Distill: Learning Generalizable Bimanual Visuomotor Policies from Single Human Video

Georgios Tziafas

Jiayun Zhang

Hamidreza Kasaei

148

24 Sep 2025

A$^2$M$^2$-Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action Recognition

^2

^2

-Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action RecognitionInternational Journal of Computer Vision (IJCV), 2025

136

22 Sep 2025

Latent Action Pretraining Through World Modeling

207

22 Sep 2025

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

...

18 Sep 2025

LayerLock: Non-collapsing Representation Learning with Progressive Freezing

140

12 Sep 2025

Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment

...

108

11 Sep 2025

Video Understanding by Design: How Datasets Shape Architectures and Insights

237

11 Sep 2025

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening

Piyush Bagad

Andrew Zisserman

AI4TS

228

10 Sep 2025

LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations

204

10 Sep 2025

Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning

128

08 Sep 2025

Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors

139

31 Aug 2025

Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering

Nattapong Kurpukdee

Adrian G. Bors

148

29 Aug 2025

Why Relational Graphs Will Save the Next Generation of Vision Foundation Models?Social Science Research Network (SSRN), 2025

Fatemeh Ziaeetabar

108

25 Aug 2025

Attention Mechanism in Randomized Time Warping

22 Aug 2025

Survey of Vision-Language-Action Models for Embodied Manipulation

466

21 Aug 2025

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

Ayushman Sarkar

Mohd Yamani Idna Idris

Zhenyu Yu

LRM

160

14 Aug 2025

ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning

179

14 Aug 2025

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

188

10 Aug 2025

Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition

175

05 Aug 2025

Zero-shot Compositional Action Recognition with Neural Logic Constraints

182

04 Aug 2025

iSafetyBench: A video-language benchmark for safety in industrial environment

260

01 Aug 2025

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi

Rabiul Awal

Ankur Sikarwar

Amirhossein Kazemnejad

...

260

01 Aug 2025

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

...

353

31 Jul 2025

Back to the Features: DINO as a Foundation for Video World Models

195

25 Jul 2025

Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition

238

22 Jul 2025

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

283

22 Jul 2025

Discovering and using Spelke segments

...

157

21 Jul 2025

GR-3 Technical Report

...

316

21 Jul 2025

Simplifying Traffic Anomaly Detection with Video Foundation Models

120

12 Jul 2025

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

239

09 Jul 2025

TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control

276

02 Jul 2025

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

...

588

01 Jul 2025

^2

ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

480

01 Jul 2025

Dual Perspectives on Non-Contrastive Self-Supervised Learning

155

18 Jun 2025

Active Multimodal Distillation for Few-shot Action RecognitionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

122

16 Jun 2025

DejaVid: Encoder-Agnostic Learned Temporal Matching for Video ClassificationComputer Vision and Pattern Recognition (CVPR), 2025

Darryl Ho

Samuel Madden

AI4TS

194

14 Jun 2025

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

...

277

134

11 Jun 2025

Synthetic Human Action Video Data Generation with Pose Transfer

Vaclav Knapp

Matyas Bohacek

250

11 Jun 2025

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

293

11 Jun 2025

An Effective End-to-End Solution for Multimodal Action RecognitionInternational Conference on Pattern Recognition (ICPR), 2025

231

11 Jun 2025

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

334

10 Jun 2025

ExAct: A Video-Language Benchmark for Expert Action Analysis

Oluwatumininu Oguntola

Gedas Bertasius

200

06 Jun 2025

Proactive Assistant Dialogue Generation from Streaming Egocentric Videos

318

06 Jun 2025

Video, How Do Your Tokens Merge?

Sam Pollard

Michael Wray

ViT MoMe

265

04 Jun 2025

Large-scale Self-supervised Video Foundation Model for Intelligent Surgery

...

243

03 Jun 2025

VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in VideosAAAI Conference on Artificial Intelligence (AAAI), 2025

235

03 Jun 2025

Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review

...

244

02 Jun 2025

Improving Keystep Recognition in Ego-Video via Dexterous Focus

Zachary Chavis

Stephen J. Guy

Hyun Soo Park

260

01 Jun 2025

Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models

370

01 Jun 2025

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

371

29 May 2025