ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense
v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXiv (abs)PDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,013 papers shown
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
Theia: Distilling Diverse Vision Foundation Models for Robot LearningConference on Robot Learning (CoRL), 2024
Jinghuan Shang
Karl Schmeckpeper
Brandon B. May
M. Minniti
Tarik Kelestemur
David Watkins
Laura Herlant
VLM
272
46
0
29 Jul 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Mixture of Nested Experts: Adaptive Processing of Visual TokensNeural Information Processing Systems (NeurIPS), 2024
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
259
17
0
29 Jul 2024
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar
Namitha Padmanabhan
Luke Luo
Sai Saketh Rambhatla
Abhinav Shrivastava
241
7
0
25 Jul 2024
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey
  Interactions in Animal Videos
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos
Zsófia Katona
Seyed Sahand Mohamadi Ziabari
Fatemeh Karimi Nejadasl
277
1
0
25 Jul 2024
HVM-1: Large-scale video models pretrained with nearly 5000 hours of
  human-like video data
HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data
Emin Orhan
VLMSyDa
194
1
0
25 Jul 2024
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action RecognitionACM Multimedia (MM), 2024
Wenbo Huang
Jinghui Zhang
Xuwei Qian
Zhen Wu
Meng Wang
Lei Zhang
269
8
0
23 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
249
14
0
22 Jul 2024
MIBench: Evaluating Multimodal Large Language Models over Multiple
  Images
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu
Xi Zhang
Haiyang Xu
Yaya Shi
Chaoya Jiang
...
Ji Zhang
Fei Huang
Chunfen Yuan
Bing Li
Weiming Hu
VLM
151
28
0
21 Jul 2024
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic
  Rewards via Failure Prompts
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang
Minghao Chen
Qibo Qiu
Jiahao Wu
Wenxiao Wang
Binbin Lin
Ziyu Guan
Xiaofei He
LM&Ro
216
4
0
20 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
534
13
0
20 Jul 2024
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual
  Recognition
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
Yurong Zhang
Honghao Chen
Xinyu Zhang
Xiangxiang Chu
Li Song
395
3
0
19 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled
  Perspective
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Changwen Zheng
Wenwen Qiang
Jianqi Zhang
Changwen Zheng
Jingyao Wang
SSL
276
0
0
19 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
289
23
0
11 Jul 2024
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Wentao Zhang
Junliang Guo
Tianyu He
Li Zhao
Linli Xu
Jiang Bian
331
7
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
229
8
0
09 Jul 2024
C2C: Component-to-Composition Learning for Zero-Shot Compositional
  Action Recognition
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li
Zhenhua Feng
Tianyang Xu
Linze Li
Xiao-Jun Wu
Muhammad Awais
Sara Atito
Josef Kittler
CoGe
403
11
0
08 Jul 2024
DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain
  Few-shot Action Recognition
DMSD-CDFSAR: Distillation from Mixed-Source Domain for Cross-Domain Few-shot Action Recognition
Fei-Yu Guo
YiKang Wang
Han Qi
Li Zhu
Jing Sun
370
5
0
08 Jul 2024
Learning Action and Reasoning-Centric Image Editing from Videos and
  Simulations
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
Christopher Pal
Siva Reddy
EGVMVGen
346
17
0
03 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
242
11
0
03 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
303
112
0
30 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
Hao Fei
Tat-Seng Chua
Shuicheng Yan
AI4TS
267
66
0
27 Jun 2024
EgoVideo: Exploring Egocentric Foundation Model and Downstream
  Adaptation
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation
Baoqi Pei
Guo Chen
Jilan Xu
Yuping He
Yicheng Liu
...
Yifei Huang
Yali Wang
Tong Lu
Limin Wang
Yu Qiao
EgoV
485
33
0
26 Jun 2024
Towards Event-oriented Long Video Understanding
Towards Event-oriented Long Video Understanding
Yifan Du
Kun Zhou
Yuqi Huo
Yifan Li
Wayne Xin Zhao
Haoyu Lu
Zijia Zhao
Bingning Wang
Weipeng Chen
Ji-Rong Wen
VLM
201
19
0
20 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
426
17
0
20 Jun 2024
Exploring the Impact of Hand Pose and Shadow on Hand-washing Action
  Recognition
Exploring the Impact of Hand Pose and Shadow on Hand-washing Action Recognition
Shengtai Ju
A. Reibman
CVBM
128
2
0
19 Jun 2024
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D
  Space
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
Yuan Wang
Zhao Wang
Junhao Gong
Di Huang
Tong He
...
J. Jiao
Xuetao Feng
Qi Dou
Weizhen He
Dan Xu
204
8
0
17 Jun 2024
HumanPlus: Humanoid Shadowing and Imitation from Humans
HumanPlus: Humanoid Shadowing and Imitation from HumansConference on Robot Learning (CoRL), 2024
Zipeng Fu
Qingqing Zhao
Qi Wu
Gordon Wetzstein
Chelsea Finn
SyDa
337
205
0
15 Jun 2024
Long Story Short: Story-level Video Understanding from 20K Short Films
Long Story Short: Story-level Video Understanding from 20K Short Films
Ridouane Ghermi
Xi Wang
Vicky Kalogeiton
Ivan Laptev
VGen
189
2
0
14 Jun 2024
A Survey of Video Datasets for Grounded Event Understanding
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
224
6
0
14 Jun 2024
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
  Understanding
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad A Khan
VLMMLLM
259
102
0
13 Jun 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
Junke Wang
Yi Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu-Gang Jiang
ViTVGen
307
78
0
13 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
270
5
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Vasu Sharma
Jundong Li
Tariq Iqbal
216
0
0
13 Jun 2024
Pandora: Towards General World Model with Natural Language Actions and
  Video States
Pandora: Towards General World Model with Natural Language Actions and Video States
Jiannan Xiang
Guangyi Liu
Yi Gu
Qiyue Gao
Yuting Ning
...
Shibo Hao
Yemin Shi
Zhengzhong Liu
Eric P. Xing
Zhiting Hu
VGen
299
66
0
12 Jun 2024
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
  Prediction
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing
Jingdong Sun
Zejia Weng
Zuxuan Wu
Yu-Gang Jiang
VGen
288
22
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
570
26
1
09 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language
  Space
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
329
2
0
05 Jun 2024
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot
  Action Recognition Models
Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models
Georgia Markham
M. Balamurali
Andrew J. Hill
385
3
0
03 Jun 2024
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Yaoyao Liu
331
12
0
02 Jun 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
374
38
0
01 Jun 2024
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
327
9
0
30 May 2024
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
Junjie Wang
Guangjing Yang
Wentao Chen
Huahui Yi
Xiaohu Wu
Qicheng Lao
MoEALM
267
0
0
29 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
363
64
0
26 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
283
83
0
24 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Yaoyao Liu
Cihang Xie
AI4TSVGenSSL
203
2
0
24 May 2024
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu
Xuyang Liu
Liangtao Shi
Zunnan Xu
Yue Hu
Yi Xin
Quanjun Yin
Bineng Zhong
Donglin Wang
247
16
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
880
164
0
23 May 2024
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
BIMM: Brain Inspired Masked Modeling for Video Representation Learning
Zhifan Wan
Jie Zhang
Chang-bo Li
Shiguang Shan
237
0
0
21 May 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
426
7
0
21 May 2024
From Sora What We Can See: A Survey of Text-to-Video Generation
From Sora What We Can See: A Survey of Text-to-Video Generation
Rui Sun
Yumin Zhang
Tejal Shah
Jiahao Sun
Shuoying Zhang
Wenqi Li
Haoran Duan
Bo Wei
R. Ranjan
EGVM
267
38
0
17 May 2024
Previous
123...567...192021
Next