Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Home
Papers
1706.04261
Cited By
v1
v2 (latest)
The "something something" video database for learning and evaluating visual common sense
IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The "something something" video database for learning and evaluating visual common sense"
50 / 1,013 papers shown
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
348
2
0
16 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
EgoV
510
12
0
12 Mar 2025
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
Baiyu Chen
Wilson Wongso
Zechen Li
Yonchanok Khaokaew
Hao Xue
Flora D. Salim
444
4
0
10 Mar 2025
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
Hritik Bansal
Clark Peng
Yonatan Bitton
Roman Goldenberg
Aditya Grover
Kai-Wei Chang
EGVM
VGen
296
39
0
09 Mar 2025
Object-Centric World Model for Language-Guided Manipulation
Youngjoon Jeong
Junha Chun
S. Cha
Taesup Kim
OCL
VGen
829
8
0
08 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Computer Vision and Pattern Recognition (CVPR), 2025
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
418
7
0
03 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
International Conference on Learning Representations (ICLR), 2025
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
209
35
0
01 Mar 2025
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
Haoxin Li
Yingchen Yu
Qilong Wu
Hanwang Zhang
Boyang Li
Song Bai
3DH
VGen
1.1K
1
0
01 Mar 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiao Wang
Jingyun Hua
Weihong Lin
Yujiao Shi
Fuzheng Zhang
Yue Yu
Di Zhang
Liqiang Nie
VLM
662
1
0
28 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
409
2
0
27 Feb 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
314
4
0
27 Feb 2025
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
International Conference on Learning Representations (ICLR), 2025
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Mengqi He
Jing Zhang
VLM
295
5
0
19 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Computer Vision and Pattern Recognition (CVPR), 2025
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
347
91
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
413
19
0
18 Feb 2025
TextOCVP: Object-Centric Video Prediction with Language Guidance
Angel Villar-Corrales
Gjergj Plepi
Sven Behnke
VGen
OCL
DiffM
524
1
0
17 Feb 2025
NeuroStrata: Harnessing Neurosymbolic Paradigms for Improved Design, Testability, and Verifiability of Autonomous CPS
Xi Zheng
Ziyang Li
Ivan Ruchkin
R. Piskac
Miroslav Pajic
156
2
0
17 Feb 2025
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
Weirui Ye
Fangchen Liu
Z. Ding
Yang Gao
Oleh Rybkin
Pieter Abbeel
VGen
OffRL
394
14
0
14 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
390
1
0
12 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
325
0
0
11 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
431
9
0
11 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Conference on Multimedia Modeling (MMM), 2025
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
449
4
0
22 Jan 2025
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Accident Analysis and Prevention (Accid Anal Prev), 2025
Ruixuan Zhang
Beichen Wang
Juexiao Zhang
Zilin Bian
Chen Feng
K. Ozbay
318
18
0
17 Jan 2025
Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics
AAAI Conference on Artificial Intelligence (AAAI), 2025
Tze Ho Elden Tse
Runyang Feng
Linfang Zheng
Jiho Park
Yixing Gao
Jihie Kim
A. Leonardis
H. Chang
452
2
0
13 Jan 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
IEEE International Conference on Robotics and Automation (ICRA), 2025
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
306
45
0
13 Jan 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Wenyi Hong
Yean Cheng
Zhiyong Yang
Weihan Wang
Lefan Wang
Xiaohan Zhang
Xiaotao Gu
Yuxiao Dong
J. Tang
CoGe
VLM
279
24
0
06 Jan 2025
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
312
11
0
31 Dec 2024
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
AAAI Conference on Artificial Intelligence (AAAI), 2024
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Yongqian Li
444
3
0
27 Dec 2024
Sensitive Image Classification by Vision Transformers
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
319
1
0
21 Dec 2024
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
International Conference on Learning Representations (ICLR), 2024
Yang Tian
Sizhe Yang
Jia Zeng
P. Wang
Dahua Lin
Hao Dong
Jiangmiao Pang
365
79
0
19 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
431
19
0
19 Dec 2024
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
912
10
0
18 Dec 2024
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
553
87
0
18 Dec 2024
HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction
Chen Bao
Jiarui Xu
Xiaolong Wang
Abhinav Gupta
Homanga Bharadhwaj
287
10
0
17 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
630
5
0
16 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
283
12
0
15 Dec 2024
Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
AAAI Conference on Artificial Intelligence (AAAI), 2024
Wenbo Huang
Jinghui Zhang
Ge Li
Lei Zhang
Shuoyuan Wang
Fang Dong
Jiahui Jin
Takahiro Ogawa
Miki Haseyama
Mamba
525
4
0
10 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
Computer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
622
7
0
02 Dec 2024
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Weiming Ren
Huan Yang
Jie Min
Cong Wei
Lei Ma
912
9
0
01 Dec 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2024
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
506
3
0
28 Nov 2024
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
350
0
0
25 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Neural Information Processing Systems (NeurIPS), 2024
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming-Hsuan Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
314
3
0
20 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
480
0
0
20 Nov 2024
Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition
Hanyu Guo
Wanchuan Yu
Suzhou Que
Kaiwen Du
Yan Yan
Hanzi Wang
414
2
0
18 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Computer Vision and Pattern Recognition (CVPR), 2024
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
388
1
0
18 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Lin Wang
Joey Tianyi Zhou
Chen Chen
LRM
391
9
0
15 Nov 2024
ClevrSkills: Compositional Language and Visual Reasoning in Robotics
Neural Information Processing Systems (NeurIPS), 2024
Sanjay Haresh
Daniel Dijkman
Apratim Bhattacharyya
Roland Memisevic
CoGe
LRM
237
7
0
13 Nov 2024
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
396
3
0
11 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Neural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
245
27
0
07 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Pengfei Yu
Jiajun Wu
L. Fei-Fei
VLM
285
83
0
07 Nov 2024
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Ruyang Liu
Haoran Tang
Haibo Liu
Yixiao Ge
Mingyu Ding
Chen Li
Jiankun Yang
VLM
234
17
0
04 Nov 2024
Previous
1
2
3
4
5
...
19
20
21
Next