v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017

13 June 2017

Raghav Goyal

Samira Ebrahimi Kahou

Moritz Mueller-Freitag

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown

Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

303

27 Nov 2023

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

Wanli Ouyang

Jingdong Wang

VLM

359

27 Nov 2023

Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023

287

27 Nov 2023

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

267

25 Nov 2023

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023

307

25 Nov 2023

Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks

Amrit Nagarajan

Anand Raghunathan

VLM ViT

22 Nov 2023

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human DemonstrationIEEE Robotics and Automation Letters (RA-L), 2023

329

100

20 Nov 2023

VideoCon: Robust Video-Language Alignment via Contrast CaptionsComputer Vision and Pattern Recognition (CVPR), 2023

137

15 Nov 2023

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language ModelsInternational Conference on Learning Representations (ICLR), 2023

...

276

13 Nov 2023

Learning Human Action Recognition Representations Without Real HumansNeural Information Processing Systems (NeurIPS), 2023

276

10 Nov 2023

Semantic-aware Video Representation for Few-shot Action RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Yutao Tang

Benjamin Bejar

René Vidal

293

10 Nov 2023

Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

124

10 Nov 2023

OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Siddharth Srivastava

Gaurav Sharma

SSL

288

07 Nov 2023

Asymmetric Masked Distillation for Pre-Training Small Foundation ModelsComputer Vision and Pattern Recognition (CVPR), 2023

Zhiyu Zhao

Bingkun Huang

Sen Xing

Gangshan Wu

Yu Qiao

Limin Wang

203

06 Nov 2023

What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?Conference on Robot Learning (CoRL), 2023

374

03 Nov 2023

On Hand-Held Grippers and the Morphological Gap in Human Manipulation Demonstration

Kiran Doshi

Yijiang Huang

Stelian Coros

156

03 Nov 2023

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology LabNeural Information Processing Systems (NeurIPS), 2023

Baoxiong Jia

212

01 Nov 2023

MM-VID: Advancing Video Understanding with GPT-4V(ision)

...

Zicheng Liu

232

30 Oct 2023

Videoprompter: an ensemble of foundational models for zero-shot video understanding

Adeel Yousaf

Muzammal Naseer

Salman Khan

Fahad Shahbaz Khan

Mubarak Shah

VLM

206

23 Oct 2023

S3Aug: Segmentation, Sampling, and Shift for Action Recognition

Taiki Sugiura

Toru Tamaki

AI4TS

215

23 Oct 2023

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

430

19 Oct 2023

A Survey on Video Diffusion ModelsACM Computing Surveys (ACM Comput. Surv.), 2023

Zuxuan Wu

439

219

16 Oct 2023

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2023

388

235

16 Oct 2023

Few-shot Action Recognition with Captioning Foundation Models

334

16 Oct 2023

Watt For What: Rethinking Deep Learning's Energy-Performance Relationship

Shashank Narayana Gowda

HAI

183

10 Oct 2023

Learning Interactive Real-World SimulatorsInternational Conference on Learning Representations (ICLR), 2023

Pieter Abbeel

345

330

09 Oct 2023

DyST: Towards Dynamic Neural Scene Representations on Real-World VideosInternational Conference on Learning Representations (ICLR), 2023

Maximilian Seitzer

Sjoerd van Steenkiste

347

09 Oct 2023

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

...

Ming-Hsuan Yang

435

517

09 Oct 2023

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Zuxuan Wu

243

08 Oct 2023

Human-oriented Representation Learning for Robotic Manipulation

Mingyu Ding

Masayoshi Tomizuka

Wei Zhan

SSL

267

04 Oct 2023

Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe

Bruno Régaldo-Saint Blancard

...

293

04 Oct 2023

A Grammatical Compositional Model for Video Action Detection

Ying Wu

249

04 Oct 2023

How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar PushingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023

412

03 Oct 2023

Beyond the Benchmark: Detecting Diverse Anomalies in Videos

Yoav Arad

Michael Werman

174

03 Oct 2023

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to VideoEuropean Conference on Computer Vision (ECCV), 2023

Xinhao Li

Yuhan Zhu

Limin Wang

VLM

324

02 Oct 2023

A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos

260

01 Oct 2023

ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged ScenesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023

Kartik Ramachandruni

Max Zuo

Sonia Chernova

260

30 Sep 2023

Egocentric RGB+Depth Action Recognition in Industry-Like Settings

Mubarak Shah

266

25 Sep 2023

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

245

20 Sep 2023

Unsupervised Open-Vocabulary Object Localization in VideosIEEE International Conference on Computer Vision (ICCV), 2023

Tianjun Xiao

...

Bernt Schiele

Thomas Brox

Zheng Zhang

Yanwei Fu

Tong He

285

18 Sep 2023

Selective Volume Mixup for Video Action Recognition

Tao Mei

212

18 Sep 2023

FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector

Qiqian Fu

Guanhong Wang

Gaoang Wang

16 Sep 2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningIEEE International Conference on Computer Vision (ICCV), 2023

216

14 Sep 2023

STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning

Palaash Agrawal

Haidi Azaman

Cheston Tan

506

13 Sep 2023

CDFSL-V: Cross-Domain Few-Shot Learning for VideosIEEE International Conference on Computer Vision (ICCV), 2023

310

07 Sep 2023

EgoPCA: A New Framework for Egocentric Hand-Object Interaction UnderstandingIEEE International Conference on Computer Vision (ICCV), 2023

175

05 Sep 2023

Hierarchical Masked 3D Diffusion Model for Video OutpaintingACM Multimedia (ACM MM), 2023

256

05 Sep 2023

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded ConversationsEuropean Conference on Computer Vision (ECCV), 2023

Gamaleldin F. Elsayed

Mohamed Elhoseiny

263

30 Aug 2023

Motion-Guided Masking for Spatiotemporal Representation LearningIEEE International Conference on Computer Vision (ICCV), 2023

209

24 Aug 2023

MOFO: MOtion FOcused Self-Supervision for Video Understanding

Mona Ahmadian

Frank Guerin

Andrew Gilbert

307

23 Aug 2023