ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.00622
  4. Cited By
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
v1v2 (latest)

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

2 June 2024
Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Yaoyao Liu
ArXiv (abs)PDFHTML

Papers citing "Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering"

34 / 34 papers shown
Seeing the Wind from a Falling Leaf
Zhiyuan Gao
Jiageng Mao
Hong-Xing Yu
Haozhe Lou
Emily Yue-Ting Jia
J. Barbič
Jiajun Wu
Yue Wang
VGenPINN
247
0
0
30 Nov 2025
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
Hunar Batra
Haoqin Tu
Hardy Chen
Yuanze Lin
Cihang Xie
Ronald Clark
OffRLReLMLRM
374
0
0
10 Nov 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
S. Yu
Yuxin Chen
Hao Ju
Lianjie Jia
Fuxi Zhang
...
Lin Song
Lijun Wang
Yanwei Li
Y. Shan
Huchuan Lu
LRM
319
9
0
23 Sep 2025
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou
Alexander Vilesov
Xuehai He
Ziyu Wan
Shuwang Zhang
Aditya Nagachandra
Di Chang
DongDong Chen
Xin Eric Wang
A. Kadambi
VLM
185
0
0
04 Aug 2025
Augmented Vision-Language Models: A Systematic Review
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLMLRM
196
0
0
24 Jul 2025
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Florian Bordes
Q. Garrido
Justine T Kao
Adina Williams
Michael G. Rabbat
Emmanuel Dupoux
PINN
223
14
0
11 Jun 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
Jieneng Chen
LRM
471
21
0
01 May 2025
Rethinking Video-Text Understanding: Retrieval from Counterfactually
  Augmented Data
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Yaoyao Liu
VGen
245
2
0
18 Jul 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
470
257
0
15 May 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLMVLM
274
276
0
25 Apr 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
Zhicheng Zheng
Xin Yan
Zhenfang Chen
Jingzhou Wang
Qin Zhi Eddie Lim
Joshua B. Tenenbaum
Chuang Gan
LRM
209
14
0
09 Feb 2024
Source-Free and Image-Only Unsupervised Domain Adaptation for Category
  Level Object Pose Estimation
Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation
Prakhar Kaushik
Aayush Mishra
Adam Kortylewski
Yaoyao Liu
3DH
234
9
0
19 Jan 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLMMLLM
1.6K
1,181
0
16 Nov 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and OcclusionsNeural Information Processing Systems (NeurIPS), 2023
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Yaoyao Liu
CoGe
318
21
0
27 Oct 2023
Physion++: Evaluating Physical Scene Understanding that Requires Online
  Inference of Different Physical Properties
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical PropertiesNeural Information Processing Systems (NeurIPS), 2023
H. Tung
Mingyu Ding
Zhenfang Chen
Daniel M. Bear
Chuang Gan
J. Tenenbaum
Daniel L. K. Yamins
Judy Fan
Kevin A. Smith
223
28
0
27 Jun 2023
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
454
446
0
06 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual
  Reasoning
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual ReasoningComputer Vision and Pattern Recognition (CVPR), 2022
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OODLRM
232
102
0
01 Dec 2022
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
  via Video Question Answering
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
244
13
0
07 Nov 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text modelsNeural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
887
4,531
0
16 Oct 2022
Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering
  of Neural Features
Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural FeaturesEuropean Conference on Computer Vision (ECCV), 2022
Wufei Ma
Angtian Wang
Alan Yuille
Adam Kortylewski
3DH3DV
204
30
0
12 Sep 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
314
39
0
10 May 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from
  Videos
ComPhy: Compositional Physical Reasoning of Objects and Events from VideosInternational Conference on Learning Representations (ICLR), 2022
Zhenfang Chen
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
J. Tenenbaum
Chuang Gan
CoGeOCL
212
60
0
02 May 2022
Dynamic Visual Reasoning by Learning Differentiable Physics Models from
  Video and Language
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and LanguageNeural Information Processing Systems (NeurIPS), 2021
Mingyu Ding
Zhenfang Chen
Tao Du
Ping Luo
J. Tenenbaum
Chuang Gan
VGenPINNOCL
226
79
0
28 Oct 2021
Physion: Evaluating Physical Prediction from Vision in Humans and
  Machines
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
Daniel M. Bear
E. Wang
Damian Mrowca
Felix Binder
Hsiau-Yu Fish Tung
...
Li Fei-Fei
Nancy Kanwisher
J. Tenenbaum
Daniel L. K. Yamins
Judith E. Fan
OOD
544
116
0
15 Jun 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalIEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
837
1,442
0
01 Apr 2021
NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose
  Estimation
NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose EstimationInternational Conference on Learning Representations (ICLR), 2021
Angtian Wang
Adam Kortylewski
Alan Yuille
3DH
222
52
0
29 Jan 2021
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
311
38
0
08 Dec 2020
CoKe: Localized Contrastive Learning for Robust Keypoint Detection
CoKe: Localized Contrastive Learning for Robust Keypoint Detection
Yutong Bai
Angtian Wang
Adam Kortylewski
Alan Yuille
207
18
0
29 Sep 2020
CATER: A diagnostic dataset for Compositional Actions and TEmporal
  Reasoning
CATER: A diagnostic dataset for Compositional Actions and TEmporal ReasoningInternational Conference on Learning Representations (ICLR), 2019
Rohit Girdhar
Deva Ramanan
382
192
0
10 Oct 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER: CoLlision Events for Video REpresentation and ReasoningInternational Conference on Learning Representations (ICLR), 2019
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
400
480
0
03 Oct 2019
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
Shichen Liu
Tianye Li
Weikai Chen
Hao Li
3DV
403
759
0
03 Apr 2019
FiLM: Visual Reasoning with a General Conditioning Layer
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAttAIMatOffRLAI4CE
776
2,882
0
22 Sep 2017
The "something something" video database for learning and evaluating
  visual common sense
The "something something" video database for learning and evaluating visual common senseIEEE International Conference on Computer Vision (ICCV), 2017
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
434
1,782
0
13 Jun 2017
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
2.5K
69,422
0
04 Jun 2015
1