Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1706.04261
Cited By
v1
v2 (latest)
The "something something" video database for learning and evaluating visual common sense
IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The "something something" video database for learning and evaluating visual common sense"
50 / 1,013 papers shown
The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video
Michelle R. Greene
Benjamin Balas
M. Lescroart
Paul MacNeilage
Jennifer A. Hart
...
Matthew W. Shinkle
Wentao Si
Brian Szekely
Joaquin M. Torres
Eliana Weissmann
MDE
143
7
0
15 Feb 2024
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
Guanxiong Sun
Yang Hua
Guosheng Hu
N. Robertson
ViT
171
1
0
14 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
309
5
0
14 Feb 2024
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
IEEE Access (IEEE Access), 2024
Chrisantus Eze
Christopher Crick
SSL
466
16
0
11 Feb 2024
VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jialu Li
Aishwarya Padmakumar
Gaurav Sukhatme
Mohit Bansal
323
10
0
05 Feb 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
International Conference on Learning Representations (ICLR), 2024
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
250
48
0
05 Feb 2024
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Haoyi Zhu
Yating Wang
Di Huang
Weicai Ye
Wanli Ouyang
Tong He
SSL
3DPC
348
46
0
04 Feb 2024
Self-supervised learning of video representations from a child's perspective
A. Orhan
Wentao Wang
Alex N. Wang
Mengye Ren
Brenden M. Lake
106
5
0
01 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
401
15
0
29 Jan 2024
MV2MAE: Multi-View Video Masked Autoencoders
Ketul Shah
Robert Crandall
Jie Xu
Peng Zhou
Marian George
Mayank Bansal
Rama Chellappa
247
6
0
29 Jan 2024
Multi-model learning by sequential reading of untrimmed videos for action recognition
Kodai Kamiya
Toru Tamaki
261
0
0
26 Jan 2024
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
AAAI Conference on Artificial Intelligence (AAAI), 2024
Mengmeng Wang
Jiazheng Xing
Boyuan Jiang
Jun Chen
Jianbiao Mei
Xingxing Zuo
Guang Dai
Jingdong Wang
Yong-Jin Liu
VLM
205
8
0
22 Jan 2024
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Nicolás Ayobi
Santiago Rodríguez
Alejandra Pérez
Isabela Hernández
Nicolás Aparicio
...
Sebastián Pena
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
ViT
MedIm
217
34
0
20 Jan 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Andrei Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
419
17
0
19 Jan 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
417
2
0
19 Jan 2024
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Yun-Hai Liu
Haolin Yang
Xu Si
Ling Liu
Zipeng Li
Yuxiang Zhang
Yebin Liu
Li Yi
367
50
0
16 Jan 2024
Multi-view Distillation based on Multi-modal Fusion for Few-shot Action Recognition(CLIP-
M
2
\mathrm{M^2}
M
2
DF)
Fei-Yu Guo
YiKang Wang
Han Qi
WenPing Jin
Li Zhu
203
3
0
16 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
199
6
0
15 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
376
2
0
15 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
273
0
0
10 Jan 2024
Dr
2
^2
2
Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Computer Vision and Pattern Recognition (CVPR), 2024
Chen Zhao
Shuming Liu
K. Mangalam
Guocheng Qian
Fatimah Zohra
Abdulmohsen Alghannam
Jitendra Malik
Guohao Li
229
8
0
08 Jan 2024
Commonsense for Zero-Shot Natural Language Video Localization
AAAI Conference on Artificial Intelligence (AAAI), 2023
Meghana Holla
Ismini Lourentzou
340
4
0
29 Dec 2023
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
714
167
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
282
271
0
28 Dec 2023
Open-Vocabulary Video Relation Extraction
Wentao Tian
Zheng Wang
Yu Fu
Yue Yu
Lechao Cheng
180
2
0
25 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
532
400
0
21 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
227
6
0
21 Dec 2023
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I. Dave
Simon Jenni
Mubarak Shah
178
12
0
20 Dec 2023
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Yogesh S Rawat
VLM
273
11
0
13 Dec 2023
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Computer Vision and Pattern Recognition (CVPR), 2023
Yash Jain
Anshul Nasery
Vibhav Vineet
Harkirat Singh Behl
VGen
276
60
0
12 Dec 2023
Early Action Recognition with Action Prototypes
G. Camporese
Alessandro Bergamo
Xunyu Lin
Joseph Tighe
Davide Modolo
EgoV
130
0
0
11 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
501
46
0
11 Dec 2023
Counterfactual World Modeling for Physical Dynamics Understanding
Rahul Venkatesh
Honglin Chen
Kevin T. Feigelis
Daniel M. Bear
Khaled Jedoui
...
Wanhee Lee
Sherry Liu
Kevin A. Smith
Judith E. Fan
Daniel L. K. Yamins
VGen
310
7
0
11 Dec 2023
Dexterous Functional Grasping
Conference on Robot Learning (CoRL), 2023
Ananye Agarwal
Shagun Uppal
Kenneth Shaw
Deepak Pathak
346
49
0
05 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition
ACM Multimedia (ACM MM), 2023
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
195
14
0
04 Dec 2023
Consistency Prototype Module and Motion Compensation for Few-Shot Action Recognition (CLIP-CP
M
2
\mathbf{M^2}
M
2
C)
Fei-Yu Guo
Li Zhu
YiKang Wang
Han Qi
274
8
0
02 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Computer Vision and Pattern Recognition (CVPR), 2023
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLM
VLM
348
226
0
01 Dec 2023
Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction Plans
IEEE International Conference on Robotics and Automation (ICRA), 2023
Homanga Bharadhwaj
Abhi Gupta
Vikash Kumar
Shubham Tulsiani
LM&Ro
316
58
0
01 Dec 2023
Just Add
π
π
π
! Pose Induced Video Transformers for Understanding Activities of Daily Living
Computer Vision and Pattern Recognition (CVPR), 2023
Dominick Reilly
Srijan Das
ViT
296
27
0
30 Nov 2023
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
M. Gwilliam
Michael Cogswell
Meng Ye
Karan Sikka
Abhinav Shrivastava
Ajay Divakaran
3DV
288
1
1
30 Nov 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
Neural Information Processing Systems (NeurIPS), 2023
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
341
30
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
European Conference on Computer Vision (ECCV), 2023
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
327
0
0
30 Nov 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
383
13
0
30 Nov 2023
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
European Conference on Computer Vision (ECCV), 2023
Shicheng Li
Lei Li
Shuhuai Ren
Yuanxin Liu
Yi Liu
Rundong Gao
Xu Sun
Lu Hou
227
49
0
29 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation Learning
International Conference on Agents and Artificial Intelligence (ICAART), 2023
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
150
3
0
28 Nov 2023
Panoptic Video Scene Graph Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Jingkang Yang
Wen-Hsiao Peng
Xiangtai Li
Zujin Guo
Liangyu Chen
...
Zheng Ma
Kaiyang Zhou
Wayne Zhang
Chen Change Loy
Ziwei Liu
VOS
317
53
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Computer Vision and Pattern Recognition (CVPR), 2023
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
664
857
0
28 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
322
2
0
28 Nov 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLM
VLM
184
84
0
28 Nov 2023
Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siyuan Huang
Yifan Zhou
Ram Prabhakar Kathirvel
Xijun Liu
Yuxiang Guo
Hongrui Yi
Cheng-Fang Peng
Rama Chellappa
Chun Pong Lau
132
0
0
27 Nov 2023
Previous
1
2
3
...
7
8
9
...
19
20
21
Next