Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.10421
Cited By
v1
v2 (latest)
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
14 June 2024
Tu Anh Dinh
Carlos Mullov
Leonard Barmann
Zhaolin Li
Danni Liu
Simon Reiß
Jueun Lee
Nathan Lerzer
Fabian Ternava
Jianfeng Gao
Tobias Röddiger
Alexander Waibel
Tamim Asfour
Michael Beigl
Rainer Stiefelhagen
Carsten Dachsbacher
Klemens Bohm
Jan Niehues
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (1★)
Papers citing
"SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading"
8 / 8 papers shown
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILM
ELM
369
0
0
29 Oct 2025
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
Yingming Pu
Tao Lin
Hongyu Chen
191
0
0
29 Sep 2025
Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons
Isik Baran Sandan
Tu Anh Dinh
Jan Niehues
ELM
382
3
0
04 Jun 2025
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
Cedric Lothritz
Jordi Cabot
Laura Bernardy
464
2
0
02 Apr 2025
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
Chenyue Li
Wen Deng
Mengqian Lu
Binhang Yuan
ELM
AI4Cl
LRM
652
5
0
03 Feb 2025
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
Ippei Fujisawa
Sensho Nobe
Hiroki Seto
Rina Onda
Yoshiaki Uchida
Hiroki Ikoma
Pei-Chun Chien
Ryota Kanai
LRM
339
10
0
04 Oct 2024
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Kehua Feng
Keyan Ding
Weijie Wang
Xiang Zhuang
Yuqi Tang
Ming Qin
Yu Zhao
ELM
404
12
0
13 Jun 2024
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian
Andrea Esuli
Giovanni Puccetti
ELM
333
7
0
27 Mar 2024
1
Page 1 of 1