ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04261
  4. Cited By
The "something something" video database for learning and evaluating
  visual common sense
v1v2 (latest)

The "something something" video database for learning and evaluating visual common sense

IEEE International Conference on Computer Vision (ICCV), 2017
13 June 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
Heuna Kim
V. Haenel
Ingo Fründ
P. Yianilos
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
    VLM
ArXiv (abs)PDFHTML

Papers citing "The "something something" video database for learning and evaluating visual common sense"

50 / 1,014 papers shown
MOFO: MOtion FOcused Self-Supervision for Video Understanding
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
307
4
0
23 Aug 2023
Opening the Vocabulary of Egocentric Actions
Opening the Vocabulary of Egocentric ActionsNeural Information Processing Systems (NeurIPS), 2023
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
310
23
0
22 Aug 2023
Are current long-term video understanding datasets long-term?
Are current long-term video understanding datasets long-term?
Ombretta Strafforello
Klamer Schutte
Jan van Gemert
207
10
0
22 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked AutoencodingIEEE International Conference on Computer Vision (ICCV), 2023
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
155
50
0
21 Aug 2023
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
Boosting Few-shot Action Recognition with Graph-guided Hybrid MatchingIEEE International Conference on Computer Vision (ICCV), 2023
Jiazheng Xing
Mengmeng Wang
Yudi Ruan
Bofan Chen
Yaowei Guo
B. Mu
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
208
34
0
18 Aug 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
409
498
0
17 Aug 2023
SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
SRMAE: Masked Image Modeling for Scale-Invariant Deep RepresentationsChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Zhiming Wang
Lin Gu
Feng Lu
239
1
0
17 Aug 2023
On the Importance of Spatial Relations for Few-shot Action Recognition
On the Importance of Spatial Relations for Few-shot Action RecognitionACM Multimedia (ACM MM), 2023
Yilun Zhang
Yu Fu
Jiabo He
Lizhe Qi
Yue Yu
Zuxuan Wu
Yueping Jiang
ViT
255
17
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
205
17
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal AccumulationIEEE International Conference on Computer Vision (ICCV), 2023
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
207
26
0
08 Aug 2023
M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
  Fine-grained Action Recognition
M3^33Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action RecognitionACM Multimedia (ACM MM), 2023
Hao Tang
Jun Liu
Shuanglin Yan
Rui Yan
Zechao Li
Jinhui Tang
281
74
0
06 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection
A Survey on Deep Learning-based Spatio-temporal Action Detection
Peng Wang
Fanwei Zeng
Yu Qian
224
9
0
03 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Multimodal Adaptation of CLIP for Few-Shot Action RecognitionPattern Recognition (Pattern Recogn.), 2023
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
181
1
0
03 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLMELM
480
789
0
30 Jul 2023
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
Scaling Up and Distilling Down: Language-Guided Robot Skill AcquisitionConference on Robot Learning (CoRL), 2023
Huy Ha
Peter R. Florence
Shuran Song
LM&Ro
274
210
0
26 Jul 2023
Group Activity Recognition in Computer Vision: A Comprehensive Review,
  Challenges, and Future Perspectives
Group Activity Recognition in Computer Vision: A Comprehensive Review, Challenges, and Future Perspectives
C. Wang
A. Mohamed
258
3
0
25 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?IEEE International Conference on Computer Vision (ICCV), 2023
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
212
17
0
18 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
335
35
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
257
27
0
13 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2023
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
367
407
0
13 Jul 2023
Free-Form Composition Networks for Egocentric Action Recognition
Free-Form Composition Networks for Egocentric Action Recognition
Haoran Wang
Qinghua Cheng
Baosheng Yu
Yibing Zhan
Dapeng Tao
Liang Ding
Haibin Ling
EgoV
321
2
0
13 Jul 2023
Reading Between the Lanes: Text VideoQA on the Road
Reading Between the Lanes: Text VideoQA on the RoadIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
George Tom
Minesh Mathew
Sergi Garcia
Dimosthenis Karatzas
C. V. Jawahar
277
20
0
08 Jul 2023
A Survey of Deep Learning in Sports Applications: Perception,
  Comprehension, and Decision
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and DecisionIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Zhonghan Zhao
Wenhao Chai
Shengyu Hao
Wenhao Hu
Guanhong Wang
Shidong Cao
Min-Gyoo Song
Lei Li
Gaoang Wang
398
24
0
07 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Huayu Chen
...
Florian Schroff
Hartwig Adam
Ming-Hsuan Yang
Ting Liu
Boqing Gong
ELM
273
14
0
06 Jul 2023
Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of
  Figure Skating
Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating
Shengyuan Liu
Yuanyuan Ding
Guihong Lao
Sihan Zhang
Ning Zhou
Wen-Yue Chen
Hao Liu
194
4
0
06 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
321
88
0
05 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
233
21
0
05 Jul 2023
Task-Specific Alignment and Multiple Level Transformer for Few-Shot
  Action Recognition
Task-Specific Alignment and Multiple Level Transformer for Few-Shot Action RecognitionNeurocomputing (Neurocomputing), 2023
Fei-Yu Guo
Li Zhu
Yiwang Wang
Jing Sun
ViT
235
10
0
05 Jul 2023
Goal Representations for Instruction Following: A Semi-Supervised
  Language Interface to Control
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to ControlConference on Robot Learning (CoRL), 2023
Vivek Myers
Andre Wang He
Kuan Fang
Homer Walke
Philippe Hansen-Estruch
Ching-An Cheng
Mihai Jalobeanu
Andrey Kolobov
Anca Dragan
Sergey Levine
LM&Ro
419
38
0
30 Jun 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language modelsInternational Conference on Learning Representations (ICLR), 2023
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
470
13
0
30 Jun 2023
How can objects help action recognition?
How can objects help action recognition?Computer Vision and Pattern Recognition (CVPR), 2023
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
233
26
0
20 Jun 2023
Dynamic Perceiver for Efficient Visual Recognition
Dynamic Perceiver for Efficient Visual RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Yizeng Han
Dongchen Han
Zeyu Liu
Yulin Wang
Xuran Pan
Yifan Pu
Chaorui Deng
Junlan Feng
Qing Xiao
Gao Huang
296
40
0
20 Jun 2023
VNVC: A Versatile Neural Video Coding Framework for Efficient
  Human-Machine Vision
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine VisionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xihua Sheng
Li Li
Dong Liu
Houqiang Li
3DH
281
22
0
19 Jun 2023
Robot Learning with Sensorimotor Pre-training
Robot Learning with Sensorimotor Pre-trainingConference on Robot Learning (CoRL), 2023
Ilija Radosavovic
Baifeng Shi
Letian Fu
Ken Goldberg
Trevor Darrell
Jitendra Malik
SSLLM&Ro
273
66
0
16 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Vasu Sharma
Srijan Das
ViT
258
4
0
15 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
316
3
0
09 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal AlignmentNeural Information Processing Systems (NeurIPS), 2023
Zihui Xue
Kristen Grauman
EgoV
285
47
0
08 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action
  Recognition
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
182
4
0
07 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLMVLM
382
135
0
07 Jun 2023
Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification
Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification
Jintao Rong
Hao Chen
Tianrun Chen
Linlin Ou
Xinyi Yu
Yifan Liu
VLMVPVLM
194
8
0
04 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
305
304
0
01 Jun 2023
LIV: Language-Image Representations and Rewards for Robotic Control
LIV: Language-Image Representations and Rewards for Robotic ControlInternational Conference on Machine Learning (ICML), 2023
Yecheng Jason Ma
William Liang
Vaidehi Som
Vikash Kumar
Amy Zhang
Osbert Bastani
Dinesh Jayaraman
LM&Ro
242
182
0
01 Jun 2023
Teacher Agent: A Knowledge Distillation-Free Framework for
  Rehearsal-based Video Incremental Learning
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning
Shengqin Jiang
Yao-Huei Fang
Haokui Zhang
Qingshan Liu
Yuankai Qi
Yang Yang
Peifeng Wang
CLL
288
1
0
01 Jun 2023
Pre-training Contextualized World Models with In-the-wild Videos for
  Reinforcement Learning
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Jialong Wu
Haoyu Ma
Chao Deng
Mingsheng Long
OffRL
297
45
0
29 May 2023
Visual Affordance Prediction for Guiding Robot Exploration
Visual Affordance Prediction for Guiding Robot ExplorationIEEE International Conference on Robotics and Automation (ICRA), 2023
Homanga Bharadhwaj
Abhi Gupta
Shubham Tulsiani
254
17
0
28 May 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric PerspectiveNeurocomputing (Neurocomputing), 2023
Thanh-Dat Truong
Khoa Luu
EgoV
389
15
0
25 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
255
9
0
25 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
282
13
0
23 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Paxion: Patching Action Knowledge in Video-Language Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELMVGen
253
41
0
18 May 2023
Motion-Scenario Decoupling for Rat-Aware Video Position Prediction:
  Strategy and Benchmark
Motion-Scenario Decoupling for Rat-Aware Video Position Prediction: Strategy and BenchmarkInternational Conference on Image and Graphics (ICIG), 2023
Xiaofeng Liu
Jiaxin Gao
Yaohua Liu
Risheng Liu
Nenggan Zheng
208
1
0
17 May 2023
Previous
123...91011...192021
Next