Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid
Progress

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

29 February 2024

Vishaal Udandarao

Philip H. S. Torr

Matthias Bethge

Papers citing "Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"

11 / 11 papers shown

Title
Holistic Evaluation of Text-To-Image Models Tony Lee Michihiro Yasunaga Chenlin Meng Yifan Mai Joon Sung Park ... Jun-Yan Zhu Fei-Fei Li Jiajun Wu Stefano Ermon Percy Liang 139 125 0 07 Nov 2023
Online Continual Learning Without the Storage Constraint Ameya Prabhu Zhipeng Cai P. Dokania Philip H. S. Torr V. Koltun Ozan Sener CLL 118 30 0 16 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 236 2,232 0 22 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Nitzan Bitton-Guetta Yonatan Bitton Jack Hessel Ludwig Schmidt Yuval Elovici Gabriel Stanovsky Roy Schwartz VLM 121 65 0 13 Mar 2023
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank Q. Garrido Randall Balestriero Laurent Najman Yann LeCun SSL 46 71 0 05 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 218 441 0 23 Aug 2022
Analyzing Dynamic Adversarial Training Data in the Limit Eric Wallace Adina Williams Robin Jia Douwe Kiela 184 29 0 16 Oct 2021
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 159 157 0 16 Oct 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis Christopher Potts Zhengxuan Wu Atticus Geiger Douwe Kiela 230 76 0 30 Dec 2020
Estimating Example Difficulty Using Variance of Gradients Chirag Agarwal Daniel D'souza Sara Hooker 190 105 0 26 Aug 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,943 0 20 Apr 2018