Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1706.04261
Cited By
v1
v2 (latest)
The "something something" video database for learning and evaluating visual common sense
IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The "something something" video database for learning and evaluating visual common sense"
50 / 1,013 papers shown
Natural Language Can Help Bridge the Sim2Real Gap
Albert Yu
Adeline Foote
Raymond J. Mooney
Roberto Martín-Martín
LM&Ro
396
21
0
16 May 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Computer Vision and Pattern Recognition (CVPR), 2024
Yunhao Ge
Yihe Tang
Lyne Tchapmi
Cem Gokmen
Chengshu Li
...
Miao Liu
Pengchuan Zhang
Ruohan Zhang
Fei-Fei Li
Jiajun Wu
VGen
182
13
0
15 May 2024
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Yingjie Zhai
Wenshuo Li
Yehui Tang
Xinghao Chen
Yunhe Wang
ViT
223
2
0
14 May 2024
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Neural Information Processing Systems (NeurIPS), 2024
Gunshi Gupta
Karmesh Yadav
Y. Gal
Dhruv Batra
Z. Kira
Cong Lu
Tim G. J. Rudner
276
12
0
09 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
172
3
0
09 May 2024
Sora and V-JEPA Have Not Learned The Complete Real World Model -- A Philosophical Analysis of Video AIs Through the Theory of Productive Imagination
Jianqiu Zhang
VGen
100
0
0
06 May 2024
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Jameel Hassan
Muzammal Naseer
Federico Tombari
Fahad Shahbaz Khan
Salman Khan
LRM
ELM
281
26
0
06 May 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
IEEE transactions on multimedia (IEEE TMM), 2024
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
447
16
0
03 May 2024
Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation
European Conference on Computer Vision (ECCV), 2024
Homanga Bharadhwaj
Roozbeh Mottaghi
Abhinav Gupta
Shubham Tulsiani
3DPC
229
3
0
02 May 2024
Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy
Hoang-Quan Nguyen
Thanh-Dat Truong
Khoa Luu
287
1
0
02 May 2024
WorldGPT: Empowering LLM as Multimodal World Model
Zhiqi Ge
Hongzhe Huang
Mingze Zhou
Juncheng Li
Guoming Wang
Siliang Tang
Yueting Zhuang
314
58
0
28 Apr 2024
VIEW: Visual Imitation Learning with Waypoints
Ananth Jonnavittula
Sagar Parekh
Dylan P. Losey
SSL
550
18
0
27 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
272
276
0
25 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
359
75
0
24 Apr 2024
Rank2Reward: Learning Shaped Reward Functions from Passive Video
Daniel Yang
Davin Tjia
Jacob Berg
Dima Damen
Pulkit Agrawal
Abhishek Gupta
OffRL
229
15
0
23 Apr 2024
1st Place Solution to the 1st SkatingVerse Challenge
Tao Sun
Yuanzi Fu
Kaicheng Yang
Jian Wu
Ziyong Feng
VGen
98
0
0
22 Apr 2024
On the Content Bias in Fréchet Video Distance
Jason S. Hoffman
Aniruddha Mahapatra
Gaurav Parmar
Jun-Yan Zhu
Jia-Bin Huang
EGVM
255
32
0
18 Apr 2024
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
Xunsong Li
Pengzhan Sun
Yangcen Liu
Lixin Duan
Wen Li
433
6
0
18 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
267
8
0
15 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
638
10
0
15 Apr 2024
T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos
Artur Xarles
Sergio Escalera
T. Moeslund
Albert Clapés
258
24
0
08 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
303
14
0
06 Apr 2024
Visual Knowledge in the Big Model Era: Retrospect and Prospect
Wenguan Wang
Yi Yang
Yunhe Pan
VLM
313
29
0
05 Apr 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
297
25
0
05 Apr 2024
ASTRA: An Action Spotting TRAnsformer for Soccer Videos
Artur Xarles
Sergio Escalera
T. Moeslund
Albert Clapés
347
15
0
02 Apr 2024
SUGAR: Pre-training 3D Visual Representations for Robotics
Computer Vision and Pattern Recognition (CVPR), 2024
Shizhe Chen
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
258
33
0
01 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
377
119
0
01 Apr 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
193
123
0
30 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
285
29
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
217
5
0
24 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
European Conference on Computer Vision (ECCV), 2024
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
260
104
0
22 Mar 2024
VidLA: Video-Language Alignment at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
224
8
0
21 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
793
693
0
21 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
286
40
0
20 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
155
6
0
19 Mar 2024
VideoBadminton: A Video Dataset for Badminton Action Recognition
Qi Li
Tzu-Chen Chiu
Hsiang-Wei Huang
Minmin Sun
Wei-Shinn Ku
181
1
0
19 Mar 2024
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Neural Information Processing Systems (NeurIPS), 2024
Wangbo Zhao
Jiasheng Tang
Yizeng Han
Yibing Song
Kai Wang
Gao Huang
F. Wang
Yang You
320
23
0
18 Mar 2024
Don't Judge by the Look: Towards Motion Coherent Video Representation
International Conference on Learning Representations (ICLR), 2024
Yitian Zhang
Yue Bai
Huan Wang
Yizhou Wang
Yun Fu
258
3
0
14 Mar 2024
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation
Chengshu Li
Ruohan Zhang
J. Wong
Cem Gokmen
S. Srivastava
...
Silvio Savarese
H. Gweon
Chenxi Liu
Jiajun Wu
Fei-Fei Li
VGen
LM&Ro
VLM
197
86
0
14 Mar 2024
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Computer Vision and Pattern Recognition (CVPR), 2024
Soumen Basu
Mayuna Gupta
Chetan Madan
Pankaj Gupta
Chetan Arora
265
12
0
13 Mar 2024
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
W. G. C. Bandara
Vishal M. Patel
VPVLM
VLM
251
3
0
11 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
European Conference on Computer Vision (ECCV), 2024
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
277
385
0
11 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
ACM Multimedia (ACM MM), 2023
Boshen Xu
Sipeng Zheng
Qin Jin
189
14
0
09 Mar 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVM
VGen
274
66
0
08 Mar 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
234
9
0
29 Feb 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya Zhang
Xianyuan Zhan
LM&Ro
OffRL
279
14
0
28 Feb 2024
Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning
Wuyang Chen
Jialin Song
Pu Ren
Shashank Subramanian
Dmitriy Morozov
Michael W. Mahoney
AI4CE
445
20
0
24 Feb 2024
Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition
Yuke Li
Guangyi Chen
Ben Abramowitz
Stefano Anzellotti
Donglai Wei
TTA
297
3
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
386
67
0
20 Feb 2024
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Q. Garrido
Jean Ponce
Xinlei Chen
Michael G. Rabbat
Yann LeCun
Mahmoud Assran
Nicolas Ballas
MDE
VLM
345
173
0
15 Feb 2024
Previous
1
2
3
...
6
7
8
...
19
20
21
Next