ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense
v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXiv (abs)PDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
375
2
0
29 May 2025
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
Jaehyun Choi
Jiwan Hur
Gyojin Han
Jaemyung Yu
Junmo Kim
VGen
199
0
0
28 May 2025
Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics
Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics
Yinjie Zhao
Heng Zhao
Bihan Wen
Yew-Soon Ong
Joey Tianyi Zhou
VGen
175
1
0
28 May 2025
Advancing Video Self-Supervised Learning via Image Foundation Models
Advancing Video Self-Supervised Learning via Image Foundation ModelsPattern Recognition Letters (Pattern Recogn. Lett.), 2025
Jingwei Wu
Zhewei Huang
Chang Liu
200
0
0
25 May 2025
Inference Compute-Optimal Video Vision Language Models
Inference Compute-Optimal Video Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Peiqi Wang
ShengYun Peng
Xuewen Zhang
Hanchao Yu
Yibo Yang
Lifu Huang
Fujun Liu
Qifan Wang
VLM
275
2
0
24 May 2025
Multi-task Learning For Joint Action and Gesture Recognition
Multi-task Learning For Joint Action and Gesture Recognition
Konstantinos Spathis
N. Kardaris
Petros Maragos
158
1
0
23 May 2025
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition
Fatemeh Ziaeetabar
Florentin Wörgötter
362
3
0
21 May 2025
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
Hanjung Kim
Jaehyun Kang
Hyolim Kang
Meedeum Cho
Seon Joo Kim
Youngwoon Lee
454
9
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
346
0
0
13 May 2025
Video Dataset Condensation with Diffusion Models
Video Dataset Condensation with Diffusion Models
Zhe Li
Hadrien Reynaud
Mischa Dombrowski
Sarah Cechnicka
Franciskus Xaverius Erick
Bernhard Kainz
DDVGen
501
1
0
10 May 2025
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition
Congqi Cao
Peiheng Han
Y. Zhang
Yating Yu
Qinyi Lv
Lingtong Min
Yanning Zhang
VLM
435
0
0
09 May 2025
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Harshvardhan Sikka
Paul Pu Liang
LM&RoVLM
1.1K
4
0
08 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action FlowUSENIX Security Symposium (USENIX Security), 2023
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
428
4
0
02 May 2025
MINERVA: Evaluating Complex Video Reasoning
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
333
8
0
01 May 2025
Direct Motion Models for Assessing Generated Videos
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVMDiffMVGen
487
5
0
30 Apr 2025
A Survey of Interactive Generative Video
A Survey of Interactive Generative Video
Jiwen Yu
Yiran Qin
Haoxuan Che
Quande Liu
Xinyu Wang
Pengfei Wan
Di Zhang
Kun Gai
Hao Chen
Xihui Liu
VGen
427
15
0
30 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
496
3
0
28 Apr 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-ModelsIEEE International Conference on Robotics and Automation (ICRA), 2025
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
255
7
0
17 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
322
2
0
16 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yujiao Shi
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGenVLM
379
11
0
14 Apr 2025
Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition
Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition
Hongyu Qu
Ling Xing
Rui Yan
Yazhou Yao
G. Xie
Xiangbo Shu
262
1
0
14 Apr 2025
Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities
Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoVVLM
348
0
0
11 Apr 2025
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
193
11
0
10 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViTSSL
339
0
0
08 Apr 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
921
3
0
08 Apr 2025
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Chuning Zhu
Raymond Yu
S. Feng
Benjamin Burchfiel
Paarth Shah
Abhishek Gupta
VGen
501
43
0
03 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
VLM
665
17
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
351
0
0
02 Apr 2025
Learning from Streaming Video with Orthogonal Gradients
Learning from Streaming Video with Orthogonal GradientsComputer Vision and Pattern Recognition (CVPR), 2025
Tengda Han
Dilara Gokay
Joseph Heyward
Chuhan Zhang
Daniel Zoran
Viorica Patraucean
João Carreira
Dima Damen
Andrew Zisserman
277
5
0
02 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIPVLM
435
37
0
01 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video LearningComputer Vision and Pattern Recognition (CVPR), 2025
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
338
3
0
01 Apr 2025
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Boyuan Wang
Xiaofeng Wang
Chaojun Ni
Guosheng Zhao
Zhiqin Yang
...
Yukun Zhou
Xinze Chen
Guan Huang
Lihong Liu
Xingang Wang
VGen
388
16
0
31 Mar 2025
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
ZeroMimic: Distilling Robotic Manipulation Skills from Web VideosIEEE International Conference on Robotics and Automation (ICRA), 2025
Junyao Shi
Zhuolun Zhao
Tianyou Wang
Ian Pedroza
Amy Luo
Jie Wang
Jason Ma
Dinesh Jayaraman
LM&Ro
283
13
0
31 Mar 2025
R900: Understanding the Cost-Effectiveness of Random Exploration from 900 Hours of Robotic Data Collection
R900: Understanding the Cost-Effectiveness of Random Exploration from 900 Hours of Robotic Data Collection
Shutong Jin
Axel Kaliff
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
VGen
233
0
0
30 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
524
0
0
30 Mar 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired UsersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Antonia Karamolegkou
Malvina Nikandrou
Georgios Pantazopoulos
Danae Sanchez Villegas
Phillip Rust
Ruchira Dhar
Daniel Hershcovich
Anders Søgaard
229
2
0
28 Mar 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&RoLRM
335
193
0
27 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
916
3
0
27 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video BenchmarksComputer Vision and Pattern Recognition (CVPR), 2025
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
262
1
0
24 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Q. Hu
Q. Hu
CLIP
362
0
0
24 Mar 2025
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
Zihao Chen
Hsuanyu Wu
Chi-Hsi Kung
Yi-Ting Chen
Yan-Tsung Peng
237
1
0
24 Mar 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
549
33
0
24 Mar 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zichen Liu
Kunlun Xu
Fuchun Sun
Xu Zou
Yuxin Peng
Jiahuan Zhou
VLMAI4TS
425
10
0
20 Mar 2025
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025
Kyungho Bae
Jinhyung Kim
Sihaeng Lee
Soonyoung Lee
G. Lee
Jinwoo Choi
294
11
0
20 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
310
0
0
20 Mar 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
545
381
0
18 Mar 2025
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D TeachersComputer Vision and Pattern Recognition (CVPR), 2025
Mert Bulent Sariyildiz
Philippe Weinzaepfel
Thomas Lucas
Pau de Jorge
Diane Larlus
Yannis Kalantidis
336
2
0
18 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
326
1
0
17 Mar 2025
Efficient Motion-Aware Video MLLM
Efficient Motion-Aware Video MLLMComputer Vision and Pattern Recognition (CVPR), 2025
Zijia Zhao
Yuqi Huo
Tongtian Yue
Longteng Guo
Haoyu Lu
Binghai Wang
Xin Wu
Qingbin Liu
245
3
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
280
5
0
17 Mar 2025
Previous
123456...192021
Next