ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense
v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXiv (abs)PDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,012 papers shown
Parse-Augment-Distill: Learning Generalizable Bimanual Visuomotor Policies from Single Human Video
Parse-Augment-Distill: Learning Generalizable Bimanual Visuomotor Policies from Single Human Video
Georgios Tziafas
Jiayun Zhang
Hamidreza Kasaei
148
0
0
24 Sep 2025
A$^2$M$^2$-Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action Recognition
A2^22M2^22-Net: Adaptively Aligned Multi-Scale Moment for Few-Shot Action RecognitionInternational Journal of Computer Vision (IJCV), 2025
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
136
0
0
22 Sep 2025
Latent Action Pretraining Through World Modeling
Latent Action Pretraining Through World Modeling
Bahey Tharwat
Yara Nasser
Ali Abouzeid
Ian Reid
LM&RoSSLVLM
207
1
0
22 Sep 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang
Siteng Huang
Shengke Xue
Yaxi Zhao
Jun Cen
...
Kexiang Wang
Mingxiu Chen
F. Wang
Deli Zhao
Xin Li
VGenLM&Ro
87
8
0
18 Sep 2025
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan
Nikhil Parthasarathy
Catalin Ionescu
Drew A. Hudson
Alexander Lerchner
Andrew Zisserman
Mehdi S. M. Sajjadi
João Carreira
140
0
0
12 Sep 2025
Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
Exploring Pre-training Across Domains for Few-Shot Surgical Skill Assessment
Dimitrios Anastasiou
Razvan Caramalau
Nazir Sirajudeen
M. Boal
P. J. Eddie Edwards
...
F. Mumtaz
N. Pavithran
Nader K Francis
Danail Stoyanov
E. Mazomenos
108
1
0
11 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
237
0
0
11 Sep 2025
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Piyush Bagad
Andrew Zisserman
AI4TS
228
2
0
10 Sep 2025
LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations
LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations
Payal Varshney
Adriano Lucieri
Christoph Balada
Sheraz Ahmed
Andreas Dengel
VGen
204
0
0
10 Sep 2025
Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning
Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning
Zhang Jing
Pu Nan
Xie Yu Xiang
Guo Yanming
Lu Qianqi
Zou Shiwei
Yan Jie
Chen Yan
CLL
128
1
0
08 Sep 2025
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
Xiangchen Wang
Jinrui Zhang
Teng Wang
Haigang Zhang
Feng Zheng
139
0
0
31 Aug 2025
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
Nattapong Kurpukdee
Adrian G. Bors
148
0
0
29 Aug 2025
Why Relational Graphs Will Save the Next Generation of Vision Foundation Models?
Why Relational Graphs Will Save the Next Generation of Vision Foundation Models?Social Science Research Network (SSRN), 2025
Fatemeh Ziaeetabar
108
0
0
25 Aug 2025
Attention Mechanism in Randomized Time Warping
Attention Mechanism in Randomized Time Warping
Yutaro Hiraoka
Kazuya Okamura
Kota Suto
Kazuhiro Fukui
64
0
0
22 Aug 2025
Survey of Vision-Language-Action Models for Embodied Manipulation
Survey of Vision-Language-Action Models for Embodied Manipulation
Haoran Li
Yuhui Chen
Wenbo Cui
Weiheng Liu
Kai Liu
Mingcai Zhou
Zhengtao Zhang
Dongbin Zhao
LM&Ro
466
4
0
21 Aug 2025
Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies
Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies
Ayushman Sarkar
Mohd Yamani Idna Idris
Zhenyu Yu
LRM
160
10
0
14 Aug 2025
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee
Kyungho Bae
Kyle Min
Gyeong-Moon Park
J. Choi
CLLVLM
179
0
0
14 Aug 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang
Zihan Jia
Zhilin Dai
Sheng Guo
Limin Wang
CLIPVLM
188
0
0
10 Aug 2025
Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Pulkit Kumar
Shuaiyi Huang
Matthew Walmer
Sai Saketh Rambhatla
Abhinav Shrivastava
ViT
175
2
0
05 Aug 2025
Zero-shot Compositional Action Recognition with Neural Logic Constraints
Zero-shot Compositional Action Recognition with Neural Logic Constraints
Gefan Ye
Lin Li
Kexin Li
Jun Xiao
Long Chen
182
3
0
04 Aug 2025
iSafetyBench: A video-language benchmark for safety in industrial environment
iSafetyBench: A video-language benchmark for safety in industrial environment
Raiyaan Abdullah
Yogesh S Rawat
Shruti Vyas
VLM
260
1
0
01 Aug 2025
The Promise of RL for Autoregressive Image Editing
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi
Rabiul Awal
Ankur Sikarwar
Amirhossein Kazemnejad
Ge Ya Luo
...
Sai Rajeswar
Siva Reddy
C. Pal
Benno Krojer
Aishwarya Agrawal
OffRLKELM
260
2
0
01 Aug 2025
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Xiaoyu Chen
Hangxing Wei
Pushi Zhang
Chuheng Zhang
Kaixin Wang
...
Yucen Wang
Xinquan Xiao
Li Zhao
Jianyu Chen
Jiang Bian
LM&Ro
353
13
0
31 Jul 2025
Back to the Features: DINO as a Foundation for Video World Models
Back to the Features: DINO as a Foundation for Video World Models
Federico Baldassarre
Marc Szafraniec
Basile Terver
Vasil Khalidov
Francisco Massa
Yann LeCun
Patrick Labatut
Maximilian Seitzer
Piotr Bojanowski
VGen
195
25
0
25 Jul 2025
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian
Xincheng Yao
Yifei Huang
Chongyang Zhang
Jiangyong Ying
Hong Sun
238
1
0
22 Jul 2025
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Chi-Pin Huang
Yueh-Hua Wu
Min-Hung Chen
Yu-Chun Wang
Fu-En Yang
LM&RoLRM
283
43
0
22 Jul 2025
Discovering and using Spelke segments
Discovering and using Spelke segments
R. Venkatesh
Klemen Kotar
Lilian Naing Chen
Seungwoo Kim
Luca Thomas Wheeler
...
Wanhee Lee
Honglin Chen
Daniel M. Bear
Stefan Stojanov
Daniel L. K. Yamins
157
0
0
21 Jul 2025
GR-3 Technical Report
GR-3 Technical Report
Chilam Cheang
S. Chen
Zhongren Cui
Yingdong Hu
Liqun Huang
...
Hongtao Wu
Xin Xiao
Yuyang Xiao
Jiafeng Xu
Yichu Yang
316
45
0
21 Jul 2025
Simplifying Traffic Anomaly Detection with Video Foundation Models
Simplifying Traffic Anomaly Detection with Video Foundation Models
Svetlana Orlova
Tommie Kerssies
B. B. Englert
Gijs Dubbelman
ViT
120
1
0
12 Jul 2025
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry X. Chen
Yi Wei
Luowei Zhou
Suren Kumar
239
3
0
09 Jul 2025
TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control
TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control
Zhenyang Liu
Yongchong Gu
Sixiao Zheng
Yanwei Fu
Xiangyang Xue
Yu-Gang Jiang
276
3
0
02 Jul 2025
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Qi Qin
...
Bin Fu
Xiaokang Yang
Guangtao Zhai
Ming-Hsuan Yang
Xiaohong Liu
VLM
588
86
0
01 Jul 2025
D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
D2^22ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
Wenjie Pei
Qizhong Tan
Guangming Lu
Jiandong Tian
Jun Yu
480
3
0
01 Jul 2025
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Jean Ponce
Basile Terver
M. Hebert
Michael Arbel
SSL
155
0
0
18 Jun 2025
Active Multimodal Distillation for Few-shot Action Recognition
Active Multimodal Distillation for Few-shot Action RecognitionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Weijia Feng
Yichen Zhu
Ruojia Zhang
Chenyang Wang
Fei Ma
Xiaobao Wang
Xiaobai Li
122
0
0
16 Jun 2025
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video ClassificationComputer Vision and Pattern Recognition (CVPR), 2025
Darryl Ho
Samuel Madden
AI4TS
194
0
0
14 Jun 2025
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran
Adrien Bardes
David Fan
Q. Garrido
Russell Howes
...
Sarath Chandar
Franziska Meier
Yann LeCun
Michael G. Rabbat
Nicolas Ballas
277
134
0
11 Jun 2025
Synthetic Human Action Video Data Generation with Pose Transfer
Synthetic Human Action Video Data Generation with Pose Transfer
Vaclav Knapp
Matyas Bohacek
250
1
0
11 Jun 2025
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Benno Krojer
Mojtaba Komeili
Candace Ross
Q. Garrido
Koustuv Sinha
Nicolas Ballas
Mahmoud Assran
293
6
0
11 Jun 2025
An Effective End-to-End Solution for Multimodal Action RecognitionInternational Conference on Pattern Recognition (ICPR), 2025
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
231
2
0
11 Jun 2025
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
Shuyi Zhang
Xiaoshuai Hao
Yingbo Tang
Lingfeng Zhang
Pengwei Wang
Zhongyuan Wang
Hongxuan Ma
Shanghang Zhang
VGenAI4TS
334
11
0
10 Jun 2025
ExAct: A Video-Language Benchmark for Expert Action Analysis
ExAct: A Video-Language Benchmark for Expert Action Analysis
Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius
200
1
0
06 Jun 2025
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Yichi Zhang
Xin Luna Dong
Mohammad Kachuee
Andrea Madotto
Anuj Kumar
Babak Damavandi
J. Chai
Seungwhan Moon
318
2
0
06 Jun 2025
Video, How Do Your Tokens Merge?
Video, How Do Your Tokens Merge?
Sam Pollard
Michael Wray
ViTMoMe
265
1
0
04 Jun 2025
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Shu Yang
F. Zhou
Leon D. Mayer
Fuxiang Huang
Yiliang Chen
...
Zheng Li
Jing Qin
J. Teoh
Lena Maier-Hein
Hao-tao Chen
243
3
0
03 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in VideosAAAI Conference on Artificial Intelligence (AAAI), 2025
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
235
2
0
03 Jun 2025
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Yuchen Fang
Hao Miao
Yuxuan Liang
Liwei Deng
Yue Cui
...
Yan Zhao
T. Pedersen
Christian S. Jensen
Xiaofang Zhou
Kai Zheng
AI4TSAI4CE
244
5
0
02 Jun 2025
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Zachary Chavis
Stephen J. Guy
Hyun Soo Park
260
1
0
01 Jun 2025
Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models
Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models
Kinam Kim
J. Hyung
Jaegul Choo
DiffMVGen
370
3
0
01 Jun 2025
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
371
2
0
29 May 2025
Previous
12345...192021
Next