ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08446
  4. Cited By
OLMES: A Standard for Language Model Evaluations

OLMES: A Standard for Language Model Evaluations

12 June 2024
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
    ELM
ArXivPDFHTML

Papers citing "OLMES: A Standard for Language Model Evaluations"

13 / 13 papers shown
Title
DataDecide: How to Predict Best Pretraining Data with Small Experiments
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Ian H. Magnusson
Nguyen Tai
Ben Bogin
David Heineman
Jena D. Hwang
...
Dirk Groeneveld
Oyvind Tafjord
Noah A. Smith
Pang Wei Koh
Jesse Dodge
ALM
30
0
0
15 Apr 2025
Efficient Model Development through Fine-tuning Transfer
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
59
0
0
25 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
36
0
0
19 Mar 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
Eliya Habba
Ofir Arviv
Itay Itzhak
Yotam Perlitz
Elron Bandel
Leshem Choshen
Michal Shmueli-Scheuer
Gabriel Stanovsky
67
1
0
03 Mar 2025
Typhoon T1: An Open Thai Reasoning Model
Typhoon T1: An Open Thai Reasoning Model
Pittawat Taveekitworachai
Potsawee Manakul
Kasima Tharnpipitchai
Kunat Pipatanakul
OffRL
LRM
94
0
0
13 Feb 2025
PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
75
1
0
10 Feb 2025
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
58
3
0
30 Oct 2024
Neutral residues: revisiting adapters for model extension
Neutral residues: revisiting adapters for model extension
Franck Signe Talla
Hervé Jégou
Edouard Grave
20
0
0
03 Oct 2024
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe
Oyvind Tafjord
Yonatan Belinkov
Hanna Hajishirzi
Ashish Sabharwal
34
3
0
21 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
58
6
1
10 Jul 2024
Is It Good Data for Multilingual Instruction Tuning or Just Bad
  Multilingual Evaluation for Large Language Models?
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
Pinzhen Chen
Simon Yu
Zhicheng Guo
Barry Haddow
ELM
46
1
0
18 Jun 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
130
349
0
01 Feb 2024
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
138
181
0
22 Oct 2022
1