ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.10421
  4. Cited By
SciEx: Benchmarking Large Language Models on Scientific Exams with Human
  Expert Grading and Automatic Grading
v1v2 (latest)

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
14 June 2024
Tu Anh Dinh
Carlos Mullov
Leonard Barmann
Zhaolin Li
Danni Liu
Simon Reiß
Jueun Lee
Nathan Lerzer
Fabian Ternava
Jianfeng Gao
Tobias Röddiger
Alexander Waibel
Tamim Asfour
Michael Beigl
Rainer Stiefelhagen
Carsten Dachsbacher
Klemens Bohm
Jan Niehues
    ELM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (1★)

Papers citing "SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading"

8 / 8 papers shown
CLINB: A Climate Intelligence Benchmark for Foundational Models
CLINB: A Climate Intelligence Benchmark for Foundational Models
Michelle Chen Huebscher
Katharine Mach
Aleksandar Stanić
Markus Leippold
Ben Gaiarin
...
Massimiliano Ciaramita
Joeri Rogelj
Christian Buck
Lierni Sestorain Saralegui
Reto Knutti
HILMELM
369
0
0
29 Oct 2025
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
Yingming Pu
Tao Lin
Hongyu Chen
191
0
0
29 Sep 2025
Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons
Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons
Isik Baran Sandan
Tu Anh Dinh
Jan Niehues
ELM
382
3
0
04 Jun 2025
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish
Cedric Lothritz
Jordi Cabot
Laura Bernardy
464
2
0
02 Apr 2025
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science
Chenyue Li
Wen Deng
Mengqian Lu
Binhang Yuan
ELMAI4ClLRM
652
5
0
03 Feb 2025
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedure
Ippei Fujisawa
Sensho Nobe
Hiroki Seto
Rina Onda
Yoshiaki Uchida
Hiroki Ikoma
Pei-Chun Chien
Ryota Kanai
LRM
339
10
0
04 Oct 2024
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Kehua Feng
Keyan Ding
Weijie Wang
Xiang Zhuang
Yuqi Tang
Ming Qin
Yu Zhao
ELM
404
12
0
13 Jun 2024
The Invalsi Benchmarks: measuring Linguistic and Mathematical
  understanding of Large Language Models in Italian
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian
Andrea Esuli
Giovanni Puccetti
ELM
333
7
0
27 Mar 2024
1
Page 1 of 1