ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.13149
  4. Cited By
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

25 August 2023
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
    ELM
ArXivPDFHTML

Papers citing "SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research"

17 / 17 papers shown
Title
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Weiqi Wang
Jiefu Ou
Y. Song
Benjamin Van Durme
Daniel Khashabi
LMTD
33
0
0
14 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
53
0
0
31 Mar 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
48
3
0
22 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Z. Li
L. Zhang
P. Wang
49
0
0
17 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity
Zhengmin Yu
Jiutian Zeng
Siyi Chen
Wenhan Xu
Dandan Xu
Xiangyu Liu
Zonghao Ying
Nan Wang
Yuan Zhang
Min Yang
ELM
108
1
0
20 Jan 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Xiangru Tang
Tianyu Hu
Muyang Ye
Yanjun Shao
Xunjian Yin
...
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark B. Gerstein
LLMAG
LRM
AI4CE
60
5
0
11 Jan 2025
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
33
4
0
18 Oct 2024
Generative Hierarchical Materials Search
Generative Hierarchical Materials Search
Sherry Yang
Simon L. Batzner
Ruiqi Gao
Muratahan Aykol
Alexander L. Gaunt
Brendan McMorrow
Danilo J. Rezende
Dale Schuurmans
Igor Mordatch
E. D. Cubuk
AI4CE
27
5
0
10 Sep 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
50
2
0
15 Jul 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
37
25
0
18 Jun 2024
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Recent Advances in Federated Learning Driven Large Language Models: A Survey on Architecture, Performance, and Security
Youyang Qu
Ming Liu
Tianqing Zhu
Longxiang Gao
Shui Yu
Wanlei Zhou
MU
FedML
52
2
0
14 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
35
1
0
08 Jun 2024
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
199
791
0
13 Sep 2019
1