ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13786
  4. Cited By
Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

23 May 2023
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
Dylan Banarse
Skanda Koppula
Joseph Heyward
Mateusz Malinowski
Yezhou Yang
Carl Doersch
Tatiana Matejovicova
Yury Sulsky
Antoine Miech
A. Fréchette
H. Klimczak
Raphael Köster
Junlin Zhang
Stephanie Winkler
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
    VLM
ArXivPDFHTML

Papers citing "Perception Test: A Diagnostic Benchmark for Multimodal Video Models"

10 / 10 papers shown
Title
MINERVA: Evaluating Complex Video Reasoning
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
43
0
0
01 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
79
351
0
26 Apr 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
34
0
0
25 Apr 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li
Y. Wu
Yi Lu
Jiashuo Yu
Licheng Tang
Jiawang Cao
Wenqing Zhu
Yuyang Sun
Jay Wu
Wenbo Zhu
34
0
0
24 Apr 2025
ZipR1: Reinforcing Token Sparsity in MLLMs
ZipR1: Reinforcing Token Sparsity in MLLMs
Feng Chen
Yefei He
Lequan Lin
J. Liu
Bohan Zhuang
Qi Wu
38
26
0
23 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
95
0
0
17 Apr 2025
Question-Answering Dense Video Events
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
55
1
0
06 Sep 2024
Robustness Analysis of Video-Language Models Against Visual and Language
  Perturbations
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
Madeline Chantry Schiappa
Shruti Vyas
Hamid Palangi
Y. S. Rawat
Vibhav Vineet
VLM
85
13
0
05 Jul 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
207
682
0
13 Oct 2021
TransTrack: Multiple Object Tracking with Transformer
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi-Xin Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
230
455
0
31 Dec 2020
1