ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.19472
  4. Cited By
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid
  Progress

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

29 February 2024
Ameya Prabhu
Vishaal Udandarao
Philip H. S. Torr
Matthias Bethge
Adel Bibi
Samuel Albanie
ArXivPDFHTML

Papers citing "Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"

11 / 11 papers shown
Title
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
139
125
0
07 Nov 2023
Online Continual Learning Without the Storage Constraint
Online Continual Learning Without the Storage Constraint
Ameya Prabhu
Zhipeng Cai
P. Dokania
Philip H. S. Torr
V. Koltun
Ozan Sener
CLL
118
30
0
16 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
236
2,232
0
22 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
  Synthetic and Compositional Images
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
65
0
13 Mar 2023
RankMe: Assessing the downstream performance of pretrained
  self-supervised representations by their rank
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
Q. Garrido
Randall Balestriero
Laurent Najman
Yann LeCun
SSL
46
71
0
05 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
184
29
0
16 Oct 2021
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Understanding Dataset Difficulty with V\mathcal{V}V-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
159
157
0
16 Oct 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
DynaSent: A Dynamic Benchmark for Sentiment Analysis
Christopher Potts
Zhengxuan Wu
Atticus Geiger
Douwe Kiela
230
76
0
30 Dec 2020
Estimating Example Difficulty Using Variance of Gradients
Estimating Example Difficulty Using Variance of Gradients
Chirag Agarwal
Daniel D'souza
Sara Hooker
190
105
0
26 Aug 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1