Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.10421
Cited By
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
14 June 2024
Tu Anh Dinh
Carlos Mullov
Leonard Barmann
Zhaolin Li
Danni Liu
Simon Reiß
Jueun Lee
Nathan Lerzer
Fabian Ternava
Jianfeng Gao
Tobias Röddiger
Alexander Waibel
Tamim Asfour
Michael Beigl
Rainer Stiefelhagen
Carsten Dachsbacher
Klemens Bohm
Jan Niehues
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading"
6 / 6 papers shown
Title
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
Cedric Lothritz
Jordi Cabot
28
0
0
02 Apr 2025
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
Ippei Fujisawa
Sensho Nobe
Hiroki Seto
Rina Onda
Yoshiaki Uchida
Hiroki Ikoma
Pei-Chun Chien
Ryota Kanai
LRM
34
3
0
04 Oct 2024
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Kehua Feng
Keyan Ding
Weijie Wang
Xiang Zhuang
Zeyuan Wang
Ming Qin
Yu Zhao
Jianhua Yao
Qiang Zhang
H. Chen
ELM
16
6
0
13 Jun 2024
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian
Andrea Esuli
Giovanni Puccetti
ELM
22
0
0
27 Mar 2024
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
209
559
0
03 May 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
207
1,089
0
20 Sep 2022
1